mirror of
https://github.com/Stirling-Tools/Stirling-PDF.git
synced 2025-04-19 11:11:18 +00:00

# Description of Changes This pull request includes several changes to the codebase, focusing on enhancing OCR support, improving endpoint management, and adding new functionality for PDF compression. The most important changes are detailed below. ### Enhancements to OCR support: * `Dockerfile` and `Dockerfile.fat`: Added support for multiple new OCR languages including Chinese (Simplified), German, French, and Portuguese. (Our top 5 languages including English) [[1]](diffhunk://#diff-dd2c0eb6ea5cfc6c4bd4eac30934e2d5746747af48fef6da689e85b752f39557R69-R72) [[2]](diffhunk://#diff-571631582b988e88c52c86960cc083b0b8fa63cf88f056f26e9e684195221c27L78-R81) ### Improvements to endpoint management: * [`src/main/java/stirling/software/SPDF/config/EndpointConfiguration.java`](diffhunk://#diff-750f31f6ecbd64b025567108a33775cad339e835a04360affff82a09410b697dR51-R66): Added a new method `isGroupEnabled` to check if a group of endpoints is enabled. * [`src/main/java/stirling/software/SPDF/config/EndpointConfiguration.java`](diffhunk://#diff-750f31f6ecbd64b025567108a33775cad339e835a04360affff82a09410b697dL179-L193): Updated endpoint groups and removed redundant qpdf endpoints. [[1]](diffhunk://#diff-750f31f6ecbd64b025567108a33775cad339e835a04360affff82a09410b697dL179-L193) [[2]](diffhunk://#diff-750f31f6ecbd64b025567108a33775cad339e835a04360affff82a09410b697dL243-L244) * [`src/main/java/stirling/software/SPDF/config/EndpointInspector.java`](diffhunk://#diff-845de13e140bb1264014539714860f044405274ad2a9481f38befdd1c1333818R1-R291): Introduced a new `EndpointInspector` class to discover and validate GET endpoints dynamically. ### New functionality for PDF compression: * [`src/main/java/stirling/software/SPDF/controller/api/misc/CompressController.java`](diffhunk://#diff-c307589e9f958f2593c9567c5ad9d63cd03788aa4803b3017b1c13b0d0485805R10): Enhanced the `CompressController` to handle nested images within form XObjects, improving the accuracy of image compression in PDFs. Remove Compresses Dependency on QPDF [[1]](diffhunk://#diff-c307589e9f958f2593c9567c5ad9d63cd03788aa4803b3017b1c13b0d0485805R10) [[2]](diffhunk://#diff-c307589e9f958f2593c9567c5ad9d63cd03788aa4803b3017b1c13b0d0485805R28-R44) [[3]](diffhunk://#diff-c307589e9f958f2593c9567c5ad9d63cd03788aa4803b3017b1c13b0d0485805L49-R61) [[4]](diffhunk://#diff-c307589e9f958f2593c9567c5ad9d63cd03788aa4803b3017b1c13b0d0485805R77-R99) [[5]](diff hunk://#diff-c307589e9f958f2593c9567c5ad9d63cd03788aa4803b3017b1c13b0d0485805L92-R191) Closes #(issue_number) --- ## Checklist ### General - [ ] I have read the [Contribution Guidelines](https://github.com/Stirling-Tools/Stirling-PDF/blob/main/CONTRIBUTING.md) - [ ] I have read the [Stirling-PDF Developer Guide](https://github.com/Stirling-Tools/Stirling-PDF/blob/main/DeveloperGuide.md) (if applicable) - [ ] I have read the [How to add new languages to Stirling-PDF](https://github.com/Stirling-Tools/Stirling-PDF/blob/main/HowToAddNewLanguage.md) (if applicable) - [ ] I have performed a self-review of my own code - [ ] My changes generate no new warnings ### Documentation - [ ] I have updated relevant docs on [Stirling-PDF's doc repo](https://github.com/Stirling-Tools/Stirling-Tools.github.io/blob/main/docs/) (if functionality has heavily changed) - [ ] I have read the section [Add New Translation Tags](https://github.com/Stirling-Tools/Stirling-PDF/blob/main/HowToAddNewLanguage.md#add-new-translation-tags) (for new translation tags only) ### UI Changes (if applicable) - [ ] Screenshots or videos demonstrating the UI changes are attached (e.g., as comments or direct attachments in the PR) ### Testing (if applicable) - [ ] I have tested my changes locally. Refer to the [Testing Guide](https://github.com/Stirling-Tools/Stirling-PDF/blob/main/DeveloperGuide.md#6-testing) for more details. --------- Co-authored-by: a <a>
100 lines
4.0 KiB
Docker
100 lines
4.0 KiB
Docker
# Main stage
|
|
FROM alpine:3.21.3@sha256:a8560b36e8b8210634f77d9f7f9efd7ffa463e380b75e2e74aff4511df3ef88c
|
|
|
|
# Copy necessary files
|
|
COPY scripts /scripts
|
|
COPY pipeline /pipeline
|
|
COPY src/main/resources/static/fonts/*.ttf /usr/share/fonts/opentype/noto/
|
|
#COPY src/main/resources/static/fonts/*.otf /usr/share/fonts/opentype/noto/
|
|
COPY build/libs/*.jar app.jar
|
|
|
|
ARG VERSION_TAG
|
|
|
|
LABEL org.opencontainers.image.title="Stirling-PDF"
|
|
LABEL org.opencontainers.image.description="A powerful locally hosted web-based PDF manipulation tool supporting 50+ operations including merging, splitting, conversion, OCR, watermarking, and more."
|
|
LABEL org.opencontainers.image.source="https://github.com/Stirling-Tools/Stirling-PDF"
|
|
LABEL org.opencontainers.image.licenses="MIT"
|
|
LABEL org.opencontainers.image.vendor="Stirling-Tools"
|
|
LABEL org.opencontainers.image.url="https://www.stirlingpdf.com"
|
|
LABEL org.opencontainers.image.documentation="https://docs.stirlingpdf.com"
|
|
LABEL maintainer="Stirling-Tools"
|
|
LABEL org.opencontainers.image.authors="Stirling-Tools"
|
|
LABEL org.opencontainers.image.version="${VERSION_TAG}"
|
|
LABEL org.opencontainers.image.keywords="PDF, manipulation, merge, split, convert, OCR, watermark"
|
|
|
|
# Set Environment Variables
|
|
ENV DOCKER_ENABLE_SECURITY=false \
|
|
VERSION_TAG=$VERSION_TAG \
|
|
JAVA_TOOL_OPTIONS="-XX:+UnlockExperimentalVMOptions \
|
|
-XX:MaxRAMPercentage=75 \
|
|
-XX:InitiatingHeapOccupancyPercent=20 \
|
|
-XX:+G1PeriodicGCInvokesConcurrent \
|
|
-XX:G1PeriodicGCInterval=10000 \
|
|
-XX:+UseStringDeduplication \
|
|
-XX:G1PeriodicGCSystemLoadThreshold=70" \
|
|
HOME=/home/stirlingpdfuser \
|
|
PUID=1000 \
|
|
PGID=1000 \
|
|
UMASK=022 \
|
|
PYTHONPATH=/usr/lib/libreoffice/program:/opt/venv/lib/python3.12/site-packages \
|
|
UNO_PATH=/usr/lib/libreoffice/program \
|
|
URE_BOOTSTRAP=file:///usr/lib/libreoffice/program/fundamentalrc
|
|
|
|
|
|
# JDK for app
|
|
RUN echo "@main https://dl-cdn.alpinelinux.org/alpine/edge/main" | tee -a /etc/apk/repositories && \
|
|
echo "@community https://dl-cdn.alpinelinux.org/alpine/edge/community" | tee -a /etc/apk/repositories && \
|
|
echo "@testing https://dl-cdn.alpinelinux.org/alpine/edge/testing" | tee -a /etc/apk/repositories && \
|
|
apk upgrade --no-cache -a && \
|
|
apk add --no-cache \
|
|
ca-certificates \
|
|
tzdata \
|
|
tini \
|
|
bash \
|
|
curl \
|
|
qpdf \
|
|
shadow \
|
|
su-exec \
|
|
openssl \
|
|
openssl-dev \
|
|
openjdk21-jre \
|
|
# Doc conversion
|
|
gcompat \
|
|
libc6-compat \
|
|
libreoffice \
|
|
# pdftohtml
|
|
poppler-utils \
|
|
# OCR MY PDF (unpaper for descew and other advanced features)
|
|
tesseract-ocr-data-eng \
|
|
tesseract-ocr-data-chi_sim \
|
|
tesseract-ocr-data-deu \
|
|
tesseract-ocr-data-fra \
|
|
tesseract-ocr-data-por \
|
|
# CV
|
|
py3-opencv \
|
|
python3 \
|
|
py3-pip \
|
|
py3-pillow@testing \
|
|
py3-pdf2image@testing && \
|
|
python3 -m venv /opt/venv && \
|
|
export PATH="/opt/venv/bin:$PATH" && \
|
|
pip install --upgrade pip && \
|
|
pip install --no-cache-dir --upgrade unoserver weasyprint && \
|
|
ln -s /usr/lib/libreoffice/program/uno.py /opt/venv/lib/python3.12/site-packages/ && \
|
|
ln -s /usr/lib/libreoffice/program/unohelper.py /opt/venv/lib/python3.12/site-packages/ && \
|
|
ln -s /usr/lib/libreoffice/program /opt/venv/lib/python3.12/site-packages/LibreOffice && \
|
|
mv /usr/share/tessdata /usr/share/tessdata-original && \
|
|
mkdir -p $HOME /configs /logs /customFiles /pipeline/watchedFolders /pipeline/finishedFolders && \
|
|
fc-cache -f -v && \
|
|
chmod +x /scripts/* && \
|
|
chmod +x /scripts/init.sh && \
|
|
# User permissions
|
|
addgroup -S stirlingpdfgroup && adduser -S stirlingpdfuser -G stirlingpdfgroup && \
|
|
chown -R stirlingpdfuser:stirlingpdfgroup $HOME /scripts /usr/share/fonts/opentype/noto /configs /customFiles /pipeline && \
|
|
chown stirlingpdfuser:stirlingpdfgroup /app.jar
|
|
|
|
EXPOSE 8080/tcp
|
|
|
|
# Set user and run command
|
|
ENTRYPOINT ["tini", "--", "/scripts/init.sh"]
|
|
CMD ["sh", "-c", "java -Dfile.encoding=UTF-8 -jar /app.jar & /opt/venv/bin/unoserver --port 2003 --interface 0.0.0.0"] |