mirror of
https://github.com/Stirling-Tools/Stirling-PDF.git
synced 2025-07-23 13:45:21 +00:00

# Description of Changes This pull request introduces enhancements to tool dependencies, endpoint configurations, and Dockerfile packages. The changes primarily focus on adding support for new tools (Ghostscript and OCRmyPDF), improving endpoint management, and refining dependency checks. Below is a summary of the most important changes grouped by theme: ### Tool Integration and Dependency Management: * Added support for Ghostscript and OCRmyPDF in the session limits and timeout configurations (`ApplicationProperties.java`). [[1]](diffhunk://#diff-0642cd5c54d57a80f1f9c26cb26677f58778dc889cb59ebfc07b7512e74a8886R548-R549) [[2]](diffhunk://#diff-0642cd5c54d57a80f1f9c26cb26677f58778dc889cb59ebfc07b7512e74a8886R582-R589) [[3]](diffhunk://#diff-0642cd5c54d57a80f1f9c26cb26677f58778dc889cb59ebfc07b7512e74a8886R602-R603) [[4]](diffhunk://#diff-0642cd5c54d57a80f1f9c26cb26677f58778dc889cb59ebfc07b7512e74a8886R636-R643) * Updated `Processes` enum and `ProcessExecutor` logic to include Ghostscript and OCRmyPDF, enabling their session limits and timeout handling. [[1]](diffhunk://#diff-b0afb37bdac8b0f1a10aca87b0244b133ac9b05518088b6f4c6c2bf48f859fdaR87-R96) [[2]](diffhunk://#diff-b0afb37bdac8b0f1a10aca87b0244b133ac9b05518088b6f4c6c2bf48f859fdaR141-R150) [[3]](diffhunk://#diff-b0afb37bdac8b0f1a10aca87b0244b133ac9b05518088b6f4c6c2bf48f859fdaL281-R303) * Integrated Ghostscript and OCRmyPDF into the external dependency check mechanism (`ExternalAppDepConfig.java`). [[1]](diffhunk://#diff-511713d04e0545670bfb44e70e84235288e91db0ef219e423628746f28e67cfcR37-R38) [[2]](diffhunk://#diff-511713d04e0545670bfb44e70e84235288e91db0ef219e423628746f28e67cfcR114-R115) ### Endpoint Configuration Enhancements: * Improved endpoint management by introducing functional group overrides, tool group fallbacks, and alternative tool group handling (`EndpointConfiguration.java`). [[1]](diffhunk://#diff-d6f76bfe88b1a7f347c10cc6956f8e7874b66772c6ac0ac432bedd3c7e8c372dR24-R25) [[2]](diffhunk://#diff-d6f76bfe88b1a7f347c10cc6956f8e7874b66772c6ac0ac432bedd3c7e8c372dR39-R178) [[3]](diffhunk://#diff-d6f76bfe88b1a7f347c10cc6956f8e7874b66772c6ac0ac432bedd3c7e8c372dL264-R357) [[4]](diffhunk://#diff-d6f76bfe88b1a7f347c10cc6956f8e7874b66772c6ac0ac432bedd3c7e8c372dR399-R437) * Added new endpoint groups and alternatives for multi-tool endpoints like "repair," "compress-pdf," and "ocr-pdf," supporting both Ghostscript and OCRmyPDF. [[1]](diffhunk://#diff-d6f76bfe88b1a7f347c10cc6956f8e7874b66772c6ac0ac432bedd3c7e8c372dL215-L216) [[2]](diffhunk://#diff-d6f76bfe88b1a7f347c10cc6956f8e7874b66772c6ac0ac432bedd3c7e8c372dL264-R357) ### Dockerfile Updates: * Removed `qpdf` and added `unpaper` and `ocrmypdf` to the package list in `Dockerfile` and `Dockerfile.fat` to enable advanced OCR and PDF manipulation features. [[1]](diffhunk://#diff-dd2c0eb6ea5cfc6c4bd4eac30934e2d5746747af48fef6da689e85b752f39557L54) [[2]](diffhunk://#diff-dd2c0eb6ea5cfc6c4bd4eac30934e2d5746747af48fef6da689e85b752f39557R71-R75) [[3]](diffhunk://#diff-571631582b988e88c52c86960cc083b0b8fa63cf88f056f26e9e684195221c27L79-R89) ### Code Cleanup and Refactoring: * Refactored `TempFileRegistry` to improve readability by simplifying variable initialization. [[1]](diffhunk://#diff-ed08845255ade38ea6e28a959ed967957611025fb7f8f5dcf4543b5f200a5fecL12) [[2]](diffhunk://#diff-ed08845255ade38ea6e28a959ed967957611025fb7f8f5dcf4543b5f200a5fecL30-R29) * Updated `CompressController` to dynamically check tool group availability using the new endpoint configuration logic. [[1]](diffhunk://#diff-fc8dc1845d34077a089d9265521ce8c8d6104a8b79137e1de5310e30ffb0348aR50) [[2]](diffhunk://#diff-fc8dc1845d34077a089d9265521ce8c8d6104a8b79137e1de5310e30ffb0348aR65-R76) These changes collectively enhance the application's capabilities for PDF processing, improve dependency management, and streamline endpoint handling. --- ## Checklist ### General - [ ] I have read the [Contribution Guidelines](https://github.com/Stirling-Tools/Stirling-PDF/blob/main/CONTRIBUTING.md) - [ ] I have read the [Stirling-PDF Developer Guide](https://github.com/Stirling-Tools/Stirling-PDF/blob/main/DeveloperGuide.md) (if applicable) - [ ] I have read the [How to add new languages to Stirling-PDF](https://github.com/Stirling-Tools/Stirling-PDF/blob/main/HowToAddNewLanguage.md) (if applicable) - [ ] I have performed a self-review of my own code - [ ] My changes generate no new warnings ### Documentation - [ ] I have updated relevant docs on [Stirling-PDF's doc repo](https://github.com/Stirling-Tools/Stirling-Tools.github.io/blob/main/docs/) (if functionality has heavily changed) - [ ] I have read the section [Add New Translation Tags](https://github.com/Stirling-Tools/Stirling-PDF/blob/main/HowToAddNewLanguage.md#add-new-translation-tags) (for new translation tags only) ### UI Changes (if applicable) - [ ] Screenshots or videos demonstrating the UI changes are attached (e.g., as comments or direct attachments in the PR) ### Testing (if applicable) - [ ] I have tested my changes locally. Refer to the [Testing Guide](https://github.com/Stirling-Tools/Stirling-PDF/blob/main/DeveloperGuide.md#6-testing) for more details.
100 lines
4.2 KiB
Docker
100 lines
4.2 KiB
Docker
# Main stage
|
|
FROM alpine:3.22.0@sha256:8a1f59ffb675680d47db6337b49d22281a139e9d709335b492be023728e11715
|
|
|
|
# Copy necessary files
|
|
COPY scripts /scripts
|
|
COPY pipeline /pipeline
|
|
COPY stirling-pdf/src/main/resources/static/fonts/*.ttf /usr/share/fonts/opentype/noto/
|
|
COPY stirling-pdf/build/libs/*.jar app.jar
|
|
|
|
ARG VERSION_TAG
|
|
|
|
LABEL org.opencontainers.image.title="Stirling-PDF"
|
|
LABEL org.opencontainers.image.description="A powerful locally hosted web-based PDF manipulation tool supporting 50+ operations including merging, splitting, conversion, OCR, watermarking, and more."
|
|
LABEL org.opencontainers.image.source="https://github.com/Stirling-Tools/Stirling-PDF"
|
|
LABEL org.opencontainers.image.licenses="MIT"
|
|
LABEL org.opencontainers.image.vendor="Stirling-Tools"
|
|
LABEL org.opencontainers.image.url="https://www.stirlingpdf.com"
|
|
LABEL org.opencontainers.image.documentation="https://docs.stirlingpdf.com"
|
|
LABEL maintainer="Stirling-Tools"
|
|
LABEL org.opencontainers.image.authors="Stirling-Tools"
|
|
LABEL org.opencontainers.image.version="${VERSION_TAG}"
|
|
LABEL org.opencontainers.image.keywords="PDF, manipulation, merge, split, convert, OCR, watermark"
|
|
|
|
# Set Environment Variables
|
|
ENV DISABLE_ADDITIONAL_FEATURES=true \
|
|
VERSION_TAG=$VERSION_TAG \
|
|
JAVA_BASE_OPTS="-XX:+UnlockExperimentalVMOptions -XX:MaxRAMPercentage=75 -XX:InitiatingHeapOccupancyPercent=20 -XX:+G1PeriodicGCInvokesConcurrent -XX:G1PeriodicGCInterval=10000 -XX:+UseStringDeduplication -XX:G1PeriodicGCSystemLoadThreshold=70" \
|
|
JAVA_CUSTOM_OPTS="" \
|
|
HOME=/home/stirlingpdfuser \
|
|
PUID=1000 \
|
|
PGID=1000 \
|
|
UMASK=022 \
|
|
PYTHONPATH=/usr/lib/libreoffice/program:/opt/venv/lib/python3.12/site-packages \
|
|
UNO_PATH=/usr/lib/libreoffice/program \
|
|
URE_BOOTSTRAP=file:///usr/lib/libreoffice/program/fundamentalrc \
|
|
PATH=$PATH:/opt/venv/bin \
|
|
STIRLING_TEMPFILES_DIRECTORY=/tmp/stirling-pdf \
|
|
TMPDIR=/tmp/stirling-pdf \
|
|
TEMP=/tmp/stirling-pdf \
|
|
TMP=/tmp/stirling-pdf
|
|
|
|
|
|
# JDK for app
|
|
RUN echo "@main https://dl-cdn.alpinelinux.org/alpine/edge/main" | tee -a /etc/apk/repositories && \
|
|
echo "@community https://dl-cdn.alpinelinux.org/alpine/edge/community" | tee -a /etc/apk/repositories && \
|
|
echo "@testing https://dl-cdn.alpinelinux.org/alpine/edge/testing" | tee -a /etc/apk/repositories && \
|
|
apk upgrade --no-cache -a && \
|
|
apk add --no-cache \
|
|
ca-certificates \
|
|
tzdata \
|
|
tini \
|
|
bash \
|
|
curl \
|
|
shadow \
|
|
su-exec \
|
|
openssl \
|
|
openssl-dev \
|
|
openjdk21-jre \
|
|
# Doc conversion
|
|
gcompat \
|
|
libc6-compat \
|
|
libreoffice \
|
|
# pdftohtml
|
|
poppler-utils \
|
|
# OCR MY PDF (unpaper for descew and other advanced features)
|
|
tesseract-ocr-data-eng \
|
|
tesseract-ocr-data-chi_sim \
|
|
tesseract-ocr-data-deu \
|
|
tesseract-ocr-data-fra \
|
|
tesseract-ocr-data-por \
|
|
unpaper \
|
|
# CV
|
|
py3-opencv \
|
|
python3 \
|
|
ocrmypdf \
|
|
py3-pip \
|
|
py3-pillow@testing \
|
|
py3-pdf2image@testing && \
|
|
python3 -m venv /opt/venv && \
|
|
/opt/venv/bin/pip install --upgrade pip setuptools && \
|
|
/opt/venv/bin/pip install --no-cache-dir --upgrade unoserver weasyprint && \
|
|
ln -s /usr/lib/libreoffice/program/uno.py /opt/venv/lib/python3.12/site-packages/ && \
|
|
ln -s /usr/lib/libreoffice/program/unohelper.py /opt/venv/lib/python3.12/site-packages/ && \
|
|
ln -s /usr/lib/libreoffice/program /opt/venv/lib/python3.12/site-packages/LibreOffice && \
|
|
mv /usr/share/tessdata /usr/share/tessdata-original && \
|
|
mkdir -p $HOME /configs /logs /customFiles /pipeline/watchedFolders /pipeline/finishedFolders /tmp/stirling-pdf && \
|
|
fc-cache -f -v && \
|
|
chmod +x /scripts/* && \
|
|
chmod +x /scripts/init.sh && \
|
|
# User permissions
|
|
addgroup -S stirlingpdfgroup && adduser -S stirlingpdfuser -G stirlingpdfgroup && \
|
|
chown -R stirlingpdfuser:stirlingpdfgroup $HOME /scripts /usr/share/fonts/opentype/noto /configs /customFiles /pipeline /tmp/stirling-pdf && \
|
|
chown stirlingpdfuser:stirlingpdfgroup /app.jar
|
|
|
|
EXPOSE 8080/tcp
|
|
|
|
# Set user and run command
|
|
ENTRYPOINT ["tini", "--", "/scripts/init.sh"]
|
|
CMD ["sh", "-c", "java -Dfile.encoding=UTF-8 -Djava.io.tmpdir=/tmp/stirling-pdf -jar /app.jar & /opt/venv/bin/unoserver --port 2003 --interface 127.0.0.1"]
|