mirror of
https://github.com/Stirling-Tools/Stirling-PDF.git
synced 2025-04-23 01:01:30 +00:00

# Description of Changes This pull request includes several changes to the codebase, focusing on enhancing OCR support, improving endpoint management, and adding new functionality for PDF compression. The most important changes are detailed below. ### Enhancements to OCR support: * `Dockerfile` and `Dockerfile.fat`: Added support for multiple new OCR languages including Chinese (Simplified), German, French, and Portuguese. (Our top 5 languages including English) [[1]](diffhunk://#diff-dd2c0eb6ea5cfc6c4bd4eac30934e2d5746747af48fef6da689e85b752f39557R69-R72) [[2]](diffhunk://#diff-571631582b988e88c52c86960cc083b0b8fa63cf88f056f26e9e684195221c27L78-R81) ### Improvements to endpoint management: * [`src/main/java/stirling/software/SPDF/config/EndpointConfiguration.java`](diffhunk://#diff-750f31f6ecbd64b025567108a33775cad339e835a04360affff82a09410b697dR51-R66): Added a new method `isGroupEnabled` to check if a group of endpoints is enabled. * [`src/main/java/stirling/software/SPDF/config/EndpointConfiguration.java`](diffhunk://#diff-750f31f6ecbd64b025567108a33775cad339e835a04360affff82a09410b697dL179-L193): Updated endpoint groups and removed redundant qpdf endpoints. [[1]](diffhunk://#diff-750f31f6ecbd64b025567108a33775cad339e835a04360affff82a09410b697dL179-L193) [[2]](diffhunk://#diff-750f31f6ecbd64b025567108a33775cad339e835a04360affff82a09410b697dL243-L244) * [`src/main/java/stirling/software/SPDF/config/EndpointInspector.java`](diffhunk://#diff-845de13e140bb1264014539714860f044405274ad2a9481f38befdd1c1333818R1-R291): Introduced a new `EndpointInspector` class to discover and validate GET endpoints dynamically. ### New functionality for PDF compression: * [`src/main/java/stirling/software/SPDF/controller/api/misc/CompressController.java`](diffhunk://#diff-c307589e9f958f2593c9567c5ad9d63cd03788aa4803b3017b1c13b0d0485805R10): Enhanced the `CompressController` to handle nested images within form XObjects, improving the accuracy of image compression in PDFs. Remove Compresses Dependency on QPDF [[1]](diffhunk://#diff-c307589e9f958f2593c9567c5ad9d63cd03788aa4803b3017b1c13b0d0485805R10) [[2]](diffhunk://#diff-c307589e9f958f2593c9567c5ad9d63cd03788aa4803b3017b1c13b0d0485805R28-R44) [[3]](diffhunk://#diff-c307589e9f958f2593c9567c5ad9d63cd03788aa4803b3017b1c13b0d0485805L49-R61) [[4]](diffhunk://#diff-c307589e9f958f2593c9567c5ad9d63cd03788aa4803b3017b1c13b0d0485805R77-R99) [[5]](diff hunk://#diff-c307589e9f958f2593c9567c5ad9d63cd03788aa4803b3017b1c13b0d0485805L92-R191) Closes #(issue_number) --- ## Checklist ### General - [ ] I have read the [Contribution Guidelines](https://github.com/Stirling-Tools/Stirling-PDF/blob/main/CONTRIBUTING.md) - [ ] I have read the [Stirling-PDF Developer Guide](https://github.com/Stirling-Tools/Stirling-PDF/blob/main/DeveloperGuide.md) (if applicable) - [ ] I have read the [How to add new languages to Stirling-PDF](https://github.com/Stirling-Tools/Stirling-PDF/blob/main/HowToAddNewLanguage.md) (if applicable) - [ ] I have performed a self-review of my own code - [ ] My changes generate no new warnings ### Documentation - [ ] I have updated relevant docs on [Stirling-PDF's doc repo](https://github.com/Stirling-Tools/Stirling-Tools.github.io/blob/main/docs/) (if functionality has heavily changed) - [ ] I have read the section [Add New Translation Tags](https://github.com/Stirling-Tools/Stirling-PDF/blob/main/HowToAddNewLanguage.md#add-new-translation-tags) (for new translation tags only) ### UI Changes (if applicable) - [ ] Screenshots or videos demonstrating the UI changes are attached (e.g., as comments or direct attachments in the PR) ### Testing (if applicable) - [ ] I have tested my changes locally. Refer to the [Testing Guide](https://github.com/Stirling-Tools/Stirling-PDF/blob/main/DeveloperGuide.md#6-testing) for more details. --------- Co-authored-by: a <a>
110 lines
3.8 KiB
Docker
110 lines
3.8 KiB
Docker
# Build the application
|
|
FROM gradle:8.13-jdk21 AS build
|
|
|
|
COPY build.gradle .
|
|
COPY settings.gradle .
|
|
COPY gradlew .
|
|
COPY gradle gradle/
|
|
RUN ./gradlew build -x spotlessApply -x spotlessCheck -x test -x sonarqube || return 0
|
|
|
|
# Set the working directory
|
|
WORKDIR /app
|
|
|
|
# Copy the entire project to the working directory
|
|
COPY . .
|
|
|
|
# Build the application with DOCKER_ENABLE_SECURITY=false
|
|
RUN DOCKER_ENABLE_SECURITY=true \
|
|
STIRLING_PDF_DESKTOP_UI=false \
|
|
./gradlew clean build -x spotlessApply -x spotlessCheck -x test -x sonarqube
|
|
|
|
# Main stage
|
|
FROM alpine:3.21.3@sha256:a8560b36e8b8210634f77d9f7f9efd7ffa463e380b75e2e74aff4511df3ef88c
|
|
|
|
# Copy necessary files
|
|
COPY scripts /scripts
|
|
COPY pipeline /pipeline
|
|
COPY src/main/resources/static/fonts/*.ttf /usr/share/fonts/opentype/noto/
|
|
COPY --from=build /app/build/libs/*.jar app.jar
|
|
|
|
ARG VERSION_TAG
|
|
|
|
# Set Environment Variables
|
|
ENV DOCKER_ENABLE_SECURITY=false \
|
|
VERSION_TAG=$VERSION_TAG \
|
|
JAVA_TOOL_OPTIONS="-XX:+UnlockExperimentalVMOptions \
|
|
-XX:MaxRAMPercentage=75 \
|
|
-XX:InitiatingHeapOccupancyPercent=20 \
|
|
-XX:+G1PeriodicGCInvokesConcurrent \
|
|
-XX:G1PeriodicGCInterval=10000 \
|
|
-XX:+UseStringDeduplication \
|
|
-XX:G1PeriodicGCSystemLoadThreshold=70" \
|
|
HOME=/home/stirlingpdfuser \
|
|
PUID=1000 \
|
|
PGID=1000 \
|
|
UMASK=022 \
|
|
FAT_DOCKER=true \
|
|
INSTALL_BOOK_AND_ADVANCED_HTML_OPS=false \
|
|
PYTHONPATH=/usr/lib/libreoffice/program:/opt/venv/lib/python3.12/site-packages \
|
|
UNO_PATH=/usr/lib/libreoffice/program \
|
|
URE_BOOTSTRAP=file:///usr/lib/libreoffice/program/fundamentalrc
|
|
|
|
|
|
# JDK for app
|
|
RUN echo "@main https://dl-cdn.alpinelinux.org/alpine/edge/main" | tee -a /etc/apk/repositories && \
|
|
echo "@community https://dl-cdn.alpinelinux.org/alpine/edge/community" | tee -a /etc/apk/repositories && \
|
|
echo "@testing https://dl-cdn.alpinelinux.org/alpine/edge/testing" | tee -a /etc/apk/repositories && \
|
|
apk upgrade --no-cache -a && \
|
|
apk add --no-cache \
|
|
ca-certificates \
|
|
tzdata \
|
|
tini \
|
|
bash \
|
|
curl \
|
|
shadow \
|
|
su-exec \
|
|
openssl \
|
|
openssl-dev \
|
|
openjdk21-jre \
|
|
# Doc conversion
|
|
gcompat \
|
|
libc6-compat \
|
|
libreoffice \
|
|
# pdftohtml
|
|
poppler-utils \
|
|
# OCR MY PDF (unpaper for descew and other advanced featues)
|
|
qpdf \
|
|
tesseract-ocr-data-eng \
|
|
tesseract-ocr-data-chi_sim \
|
|
tesseract-ocr-data-deu \
|
|
tesseract-ocr-data-fra \
|
|
tesseract-ocr-data-por \
|
|
font-terminus font-dejavu font-noto font-noto-cjk font-awesome font-noto-extra font-liberation font-linux-libertine \
|
|
# CV
|
|
py3-opencv \
|
|
python3 \
|
|
py3-pip \
|
|
py3-pillow@testing \
|
|
py3-pdf2image@testing && \
|
|
python3 -m venv /opt/venv && \
|
|
export PATH="/opt/venv/bin:$PATH" && \
|
|
pip install --upgrade pip && \
|
|
pip install --no-cache-dir --upgrade unoserver weasyprint && \
|
|
ln -s /usr/lib/libreoffice/program/uno.py /opt/venv/lib/python3.12/site-packages/ && \
|
|
ln -s /usr/lib/libreoffice/program/unohelper.py /opt/venv/lib/python3.12/site-packages/ && \
|
|
ln -s /usr/lib/libreoffice/program /opt/venv/lib/python3.12/site-packages/LibreOffice && \
|
|
mv /usr/share/tessdata /usr/share/tessdata-original && \
|
|
mkdir -p $HOME /configs /logs /customFiles /pipeline/watchedFolders /pipeline/finishedFolders && \
|
|
fc-cache -f -v && \
|
|
chmod +x /scripts/* && \
|
|
chmod +x /scripts/init.sh && \
|
|
# User permissions
|
|
addgroup -S stirlingpdfgroup && adduser -S stirlingpdfuser -G stirlingpdfgroup && \
|
|
chown -R stirlingpdfuser:stirlingpdfgroup $HOME /scripts /usr/share/fonts/opentype/noto /configs /customFiles /pipeline && \
|
|
chown stirlingpdfuser:stirlingpdfgroup /app.jar
|
|
|
|
EXPOSE 8080/tcp
|
|
# Set user and run command
|
|
ENTRYPOINT ["tini", "--", "/scripts/init.sh"]
|
|
CMD ["sh", "-c", "java -Dfile.encoding=UTF-8 -jar /app.jar & /opt/venv/bin/unoserver --port 2003 --interface 0.0.0.0"]
|