Stirling-PDF/testing/cucumber/features/external.feature

Feature: API Validation


    @libre @positive
    Scenario: Repair PDF
        Given I generate a PDF file as "fileInput"
        When I send the API request to the endpoint "/api/v1/misc/repair"
        Then the response content type should be "application/pdf"
        And the response file should have size greater than 0
        And the response status code should be 200


    @ocr @positive
    Scenario: Process PDF with OCR
        Given I generate a PDF file as "fileInput"
        And the request data includes
            | parameter         | value  |
            | languages         | eng    |
            | sidecar           | false  |
            | deskew            | true   |
            | clean             | true   |
            | cleanFinal        | true   |
            | ocrType           | Normal |
            | ocrRenderType     | hocr   |
            | removeImagesAfter | false  |
        When I send the API request to the endpoint "/api/v1/misc/ocr-pdf"
        Then the response content type should be "application/pdf"
        And the response file should have size greater than 0
        And the response status code should be 200


    @ocr @positive
    Scenario: Extract Image Scans
        Given I generate a PDF file as "fileInput"
        And the pdf contains 3 images of size 300x300 on 2 pages
        And the request data includes
            | parameter      | value |
            | angleThreshold | 5     |
            | tolerance      | 20    |
            | minArea        | 8000  |
            | minContourArea | 500   |
            | borderSize     | 1     |
        When I send the API request to the endpoint "/api/v1/misc/extract-image-scans"
        Then the response content type should be "application/octet-stream"
        And the response file should have extension ".zip"
        And the response ZIP should contain 2 files
        And the response file should have size greater than 0
        And the response status code should be 200


    @ocr @positive
    Scenario: Process PDF with OCR
        Given I generate a PDF file as "fileInput"
        And the request data includes
            | parameter         | value |
            | languages         | eng   |
            | sidecar           | false |
            | deskew            | true  |
            | clean             | true  |
            | cleanFinal        | true  |
            | ocrType           | Force |
            | ocrRenderType     | hocr  |
            | removeImagesAfter | false |
        When I send the API request to the endpoint "/api/v1/misc/ocr-pdf"
        Then the response content type should be "application/pdf"
        And the response file should have size greater than 0
        And the response status code should be 200


    @libre @positive
    Scenario Outline: Convert PDF to various word formats
        Given I generate a PDF file as "fileInput"
        And the pdf contains 3 pages with random text
        And the request data includes
            | parameter    | value    |
            | outputFormat | <format> |
        When I send the API request to the endpoint "/api/v1/convert/pdf/word"
        Then the response status code should be 200
        And the response file should have size greater than 100
        And the response file should have extension "<extension>"

        Examples:
            | format | extension |
            | docx   | .docx     |
            | odt    | .odt      |
            | doc    | .doc      |

    @ocr @pdfa1
    Scenario: PDFA
        Given I use an example file at "exampleFiles/pdfa2.pdf" as parameter "fileInput"
        And the request data includes
            | parameter    | value |
            | outputFormat | pdfa  |
        When I send the API request to the endpoint "/api/v1/convert/pdf/pdfa"
        Then the response status code should be 200
        And the response file should have extension ".pdf"
        And the response file should have size greater than 100

    @ocr @pdfa2
    Scenario: PDFA1
        Given I use an example file at "exampleFiles/pdfa1.pdf" as parameter "fileInput"
        And the request data includes
            | parameter    | value  |
            | outputFormat | pdfa-1 |
        When I send the API request to the endpoint "/api/v1/convert/pdf/pdfa"
        Then the response status code should be 200
        And the response file should have extension ".pdf"
        And the response file should have size greater than 100

    @compress @qpdf @positive
    Scenario: Compress
        Given I use an example file at "exampleFiles/ghost3.pdf" as parameter "fileInput"
        And the request data includes
            | parameter     | value |
            | optimizeLevel | 4     |
        When I send the API request to the endpoint "/api/v1/misc/compress-pdf"
        Then the response status code should be 200
        And the response file should have extension ".pdf"
        And the response file should have size greater than 100

    @compress @qpdf @positive
    Scenario: Compress
        Given I use an example file at "exampleFiles/ghost2.pdf" as parameter "fileInput"
        And the request data includes
            | parameter          | value |
            | optimizeLevel      | 1     |
            | expectedOutputSize | 5KB   |
        When I send the API request to the endpoint "/api/v1/misc/compress-pdf"
        Then the response status code should be 200
        And the response file should have extension ".pdf"
        And the response file should have size greater than 100


    @compress @qpdf @positive
    Scenario: Compress
        Given I use an example file at "exampleFiles/ghost1.pdf" as parameter "fileInput"
        And the request data includes
            | parameter          | value |
            | optimizeLevel      | 1     |
            | expectedOutputSize | 5KB   |
        When I send the API request to the endpoint "/api/v1/misc/compress-pdf"
        Then the response status code should be 200
        And the response file should have extension ".pdf"
        And the response file should have size greater than 100

    @libre @positive
    Scenario Outline: Convert PDF to various types
        Given I generate a PDF file as "fileInput"
        And the pdf contains 3 pages with random text
        And the request data includes
            | parameter    | value    |
            | outputFormat | <format> |
        When I send the API request to the endpoint "/api/v1/convert/pdf/<type>"
        Then the response status code should be 200
        And the response file should have size greater than 100
        And the response file should have extension "<extension>"

        Examples:
            | type         | format | extension |
            | text         | rtf    | .rtf      |
            | text         | txt    | .txt      |
            | presentation | ppt    | .ppt      |
            | presentation | pptx   | .pptx     |
            | presentation | odp    | .odp      |
            | html         | html   | .zip      |

    @image @positive
    Scenario Outline: Convert PDF to image
        Given I generate a PDF file as "fileInput"
        And the pdf contains 3 pages with random text
        And the pdf contains 3 images of size 300x300 on 3 pages
        And the request data includes
            | parameter   | value    |
            | dpi         | 300      |
            | imageFormat | <format> |
        When I send the API request to the endpoint "/api/v1/convert/pdf/img"
        Then the response status code should be 200
        And the response file should have size greater than 100
        And the response file should have extension ".zip"

        Examples:
            | format |
            | webp   |
            | png    |
            | jpeg   |
            | jpg    |
            | gif    |

    @libre @positive @topdf
    Scenario Outline: Convert PDF to various types
        Given I use an example file at "exampleFiles/example<extension>" as parameter "fileInput"
        When I send the API request to the endpoint "/api/v1/convert/file/pdf"
        Then the response status code should be 200
        And the response file should have size greater than 100
        And the response file should have extension ".pdf"

        Examples:
            | extension |
            | .docx     |
            | .odp      |
            | .odt      |
            | .pptx     |
            | .rtf      |

    @calibre @positive @htmltopdf
    Scenario: Convert HTML to PDF
        Given I use an example file at "exampleFiles/example.html" as parameter "fileInput"
        When I send the API request to the endpoint "/api/v1/convert/html/pdf"
        Then the response status code should be 200
        And the response file should have size greater than 100
        And the response file should have extension ".pdf"

    @calibre @positive @zippedhtmltopdf
    Scenario: Convert zipped HTML to PDF
        Given I use an example file at "exampleFiles/example_html.zip" as parameter "fileInput"
        When I send the API request to the endpoint "/api/v1/convert/html/pdf"
        Then the response status code should be 200
        And the response file should have size greater than 100
        And the response file should have extension ".pdf"

    @calibre @positive @markdowntopdf
    Scenario: Convert Markdown to PDF
        Given I use an example file at "exampleFiles/example.md" as parameter "fileInput"
        When I send the API request to the endpoint "/api/v1/convert/markdown/pdf"
        Then the response status code should be 200
        And the response file should have size greater than 100
        And the response file should have extension ".pdf"

    @markdown @positive
    Scenario: Convert PDF to Markdown format
        Given I generate a PDF file as "fileInput"
        And the pdf contains 3 pages with random text
        When I send the API request to the endpoint "/api/v1/convert/pdf/markdown"
        Then the response status code should be 200
        And the response file should have size greater than 100
        And the response file should have extension ".md"


    @positive @pdftocsv
    Scenario: Convert PDF with tables to CSV format
        Given I use an example file at "exampleFiles/tables.pdf" as parameter "fileInput"
        And the request data includes
            | parameter    | value |
            | outputFormat | csv   |
            | pageNumbers  | all   |
        When I send the API request to the endpoint "/api/v1/convert/pdf/csv"
        Then the response status code should be 200
        And the response file should have size greater than 200
        And the response file should have extension ".zip"
        And the response ZIP should contain 3 files
changes 2024-05-27 16:31:00 +01:00			`Feature: API Validation`


chore(cucumber): add create_pdf_with_black_boxes and convert-pdf-to-image outline; remove duplicate split-pdf-by-sections (#3937) # Description of Changes - What was changed - Introduced `create_pdf_with_black_boxes` helper function in `environment.py` for generating test PDFs with occluded content. - Added Scenario Outline: Convert PDF to image to `conversion.feature` to validate PDF→image conversion workflows. - Removed the duplicate Scenario Outline: split-pdf-by-sections with different parameters from `general.feature`. - Why the change was made - To enable testing of blacked-out content scenarios and ensure our suite covers image conversion. - To eliminate redundant tests and keep the feature files DRY and maintainable. --- ## Checklist ### General - [x] I have read the [Contribution Guidelines](https://github.com/Stirling-Tools/Stirling-PDF/blob/main/CONTRIBUTING.md) - [x] I have read the [Stirling-PDF Developer Guide](https://github.com/Stirling-Tools/Stirling-PDF/blob/main/devGuide/DeveloperGuide.md) (if applicable) - [ ] I have read the [How to add new languages to Stirling-PDF](https://github.com/Stirling-Tools/Stirling-PDF/blob/main/devGuide/HowToAddNewLanguage.md) (if applicable) - [x] I have performed a self-review of my own code - [x] My changes generate no new warnings ### Documentation - [ ] I have updated relevant docs on [Stirling-PDF's doc repo](https://github.com/Stirling-Tools/Stirling-Tools.github.io/blob/main/docs/) (if functionality has heavily changed) - [ ] I have read the section [Add New Translation Tags](https://github.com/Stirling-Tools/Stirling-PDF/blob/main/devGuide/HowToAddNewLanguage.md#add-new-translation-tags) (for new translation tags only) ### UI Changes (if applicable) - [ ] Screenshots or videos demonstrating the UI changes are attached (e.g., as comments or direct attachments in the PR) ### Testing (if applicable) - [x] I have tested my changes locally. Refer to the [Testing Guide](https://github.com/Stirling-Tools/Stirling-PDF/blob/main/devGuide/DeveloperGuide.md#6-testing) for more details. 2025-07-14 13:05:17 +02:00			`@libre @positive`
			`Scenario: Repair PDF`
			`Given I generate a PDF file as "fileInput"`
			`When I send the API request to the endpoint "/api/v1/misc/repair"`
			`Then the response content type should be "application/pdf"`
			`And the response file should have size greater than 0`
			`And the response status code should be 200`


			`@ocr @positive`
			`Scenario: Process PDF with OCR`
			`Given I generate a PDF file as "fileInput"`
			`And the request data includes`
			`\| parameter \| value \|`
			`\| languages \| eng \|`
			`\| sidecar \| false \|`
			`\| deskew \| true \|`
			`\| clean \| true \|`
			`\| cleanFinal \| true \|`
			`\| ocrType \| Normal \|`
			`\| ocrRenderType \| hocr \|`
			`\| removeImagesAfter \| false \|`
			`When I send the API request to the endpoint "/api/v1/misc/ocr-pdf"`
			`Then the response content type should be "application/pdf"`
			`And the response file should have size greater than 0`
			`And the response status code should be 200`


			`@ocr @positive`
			`Scenario: Extract Image Scans`
			`Given I generate a PDF file as "fileInput"`
			`And the pdf contains 3 images of size 300x300 on 2 pages`
			`And the request data includes`
			`\| parameter \| value \|`
			`\| angleThreshold \| 5 \|`
			`\| tolerance \| 20 \|`
			`\| minArea \| 8000 \|`
			`\| minContourArea \| 500 \|`
			`\| borderSize \| 1 \|`
			`When I send the API request to the endpoint "/api/v1/misc/extract-image-scans"`
			`Then the response content type should be "application/octet-stream"`
			`And the response file should have extension ".zip"`
			`And the response ZIP should contain 2 files`
			`And the response file should have size greater than 0`
			`And the response status code should be 200`


			`@ocr @positive`
			`Scenario: Process PDF with OCR`
			`Given I generate a PDF file as "fileInput"`
			`And the request data includes`
			`\| parameter \| value \|`
			`\| languages \| eng \|`
			`\| sidecar \| false \|`
			`\| deskew \| true \|`
			`\| clean \| true \|`
			`\| cleanFinal \| true \|`
			`\| ocrType \| Force \|`
			`\| ocrRenderType \| hocr \|`
			`\| removeImagesAfter \| false \|`
			`When I send the API request to the endpoint "/api/v1/misc/ocr-pdf"`
			`Then the response content type should be "application/pdf"`
			`And the response file should have size greater than 0`
			`And the response status code should be 200`


			`@libre @positive`
			`Scenario Outline: Convert PDF to various word formats`
			`Given I generate a PDF file as "fileInput"`
			`And the pdf contains 3 pages with random text`
			`And the request data includes`
			`\| parameter \| value \|`
			`\| outputFormat \| <format> \|`
			`When I send the API request to the endpoint "/api/v1/convert/pdf/word"`
			`Then the response status code should be 200`
			`And the response file should have size greater than 100`
			`And the response file should have extension "<extension>"`

			`Examples:`
			`\| format \| extension \|`
			`\| docx \| .docx \|`
			`\| odt \| .odt \|`
			`\| doc \| .doc \|`

			`@ocr @pdfa1`
			`Scenario: PDFA`
			`Given I use an example file at "exampleFiles/pdfa2.pdf" as parameter "fileInput"`
			`And the request data includes`
			`\| parameter \| value \|`
			`\| outputFormat \| pdfa \|`
			`When I send the API request to the endpoint "/api/v1/convert/pdf/pdfa"`
			`Then the response status code should be 200`
			`And the response file should have extension ".pdf"`
			`And the response file should have size greater than 100`

			`@ocr @pdfa2`
			`Scenario: PDFA1`
			`Given I use an example file at "exampleFiles/pdfa1.pdf" as parameter "fileInput"`
			`And the request data includes`
			`\| parameter \| value \|`
			`\| outputFormat \| pdfa-1 \|`
			`When I send the API request to the endpoint "/api/v1/convert/pdf/pdfa"`
			`Then the response status code should be 200`
			`And the response file should have extension ".pdf"`
			`And the response file should have size greater than 100`

			`@compress @qpdf @positive`
			`Scenario: Compress`
			`Given I use an example file at "exampleFiles/ghost3.pdf" as parameter "fileInput"`
			`And the request data includes`
			`\| parameter \| value \|`
			`\| optimizeLevel \| 4 \|`
			`When I send the API request to the endpoint "/api/v1/misc/compress-pdf"`
			`Then the response status code should be 200`
			`And the response file should have extension ".pdf"`
			`And the response file should have size greater than 100`

			`@compress @qpdf @positive`
			`Scenario: Compress`
			`Given I use an example file at "exampleFiles/ghost2.pdf" as parameter "fileInput"`
			`And the request data includes`
			`\| parameter \| value \|`
			`\| optimizeLevel \| 1 \|`
			`\| expectedOutputSize \| 5KB \|`
			`When I send the API request to the endpoint "/api/v1/misc/compress-pdf"`
			`Then the response status code should be 200`
			`And the response file should have extension ".pdf"`
			`And the response file should have size greater than 100`


			`@compress @qpdf @positive`
			`Scenario: Compress`
			`Given I use an example file at "exampleFiles/ghost1.pdf" as parameter "fileInput"`
			`And the request data includes`
			`\| parameter \| value \|`
			`\| optimizeLevel \| 1 \|`
			`\| expectedOutputSize \| 5KB \|`
			`When I send the API request to the endpoint "/api/v1/misc/compress-pdf"`
			`Then the response status code should be 200`
			`And the response file should have extension ".pdf"`
			`And the response file should have size greater than 100`

			`@libre @positive`
			`Scenario Outline: Convert PDF to various types`
			`Given I generate a PDF file as "fileInput"`
			`And the pdf contains 3 pages with random text`
			`And the request data includes`
			`\| parameter \| value \|`
			`\| outputFormat \| <format> \|`
			`When I send the API request to the endpoint "/api/v1/convert/pdf/<type>"`
			`Then the response status code should be 200`
			`And the response file should have size greater than 100`
			`And the response file should have extension "<extension>"`

			`Examples:`
			`\| type \| format \| extension \|`
			`\| text \| rtf \| .rtf \|`
			`\| text \| txt \| .txt \|`
			`\| presentation \| ppt \| .ppt \|`
			`\| presentation \| pptx \| .pptx \|`
			`\| presentation \| odp \| .odp \|`
			`\| html \| html \| .zip \|`

			`@image @positive`
			`Scenario Outline: Convert PDF to image`
			`Given I generate a PDF file as "fileInput"`
			`And the pdf contains 3 pages with random text`
			`And the pdf contains 3 images of size 300x300 on 3 pages`
			`And the request data includes`
			`\| parameter \| value \|`
			`\| dpi \| 300 \|`
			`\| imageFormat \| <format> \|`
			`When I send the API request to the endpoint "/api/v1/convert/pdf/img"`
			`Then the response status code should be 200`
			`And the response file should have size greater than 100`
			`And the response file should have extension ".zip"`

			`Examples:`
			`\| format \|`
			`\| webp \|`
			`\| png \|`
			`\| jpeg \|`
			`\| jpg \|`
			`\| gif \|`

			`@libre @positive @topdf`
			`Scenario Outline: Convert PDF to various types`
			`Given I use an example file at "exampleFiles/example<extension>" as parameter "fileInput"`
			`When I send the API request to the endpoint "/api/v1/convert/file/pdf"`
			`Then the response status code should be 200`
			`And the response file should have size greater than 100`
			`And the response file should have extension ".pdf"`

			`Examples:`
			`\| extension \|`
			`\| .docx \|`
			`\| .odp \|`
			`\| .odt \|`
			`\| .pptx \|`
			`\| .rtf \|`

			`@calibre @positive @htmltopdf`
			`Scenario: Convert HTML to PDF`
			`Given I use an example file at "exampleFiles/example.html" as parameter "fileInput"`
			`When I send the API request to the endpoint "/api/v1/convert/html/pdf"`
			`Then the response status code should be 200`
			`And the response file should have size greater than 100`
			`And the response file should have extension ".pdf"`

			`@calibre @positive @zippedhtmltopdf`
			`Scenario: Convert zipped HTML to PDF`
			`Given I use an example file at "exampleFiles/example_html.zip" as parameter "fileInput"`
			`When I send the API request to the endpoint "/api/v1/convert/html/pdf"`
			`Then the response status code should be 200`
			`And the response file should have size greater than 100`
			`And the response file should have extension ".pdf"`

			`@calibre @positive @markdowntopdf`
			`Scenario: Convert Markdown to PDF`
			`Given I use an example file at "exampleFiles/example.md" as parameter "fileInput"`
			`When I send the API request to the endpoint "/api/v1/convert/markdown/pdf"`
			`Then the response status code should be 200`
			`And the response file should have size greater than 100`
			`And the response file should have extension ".pdf"`

			`@markdown @positive`
			`Scenario: Convert PDF to Markdown format`
			`Given I generate a PDF file as "fileInput"`
			`And the pdf contains 3 pages with random text`
			`When I send the API request to the endpoint "/api/v1/convert/pdf/markdown"`
			`Then the response status code should be 200`
			`And the response file should have size greater than 100`
			`And the response file should have extension ".md"`


			`@positive @pdftocsv`
			`Scenario: Convert PDF with tables to CSV format`
			`Given I use an example file at "exampleFiles/tables.pdf" as parameter "fileInput"`
			`And the request data includes`
			`\| parameter \| value \|`
			`\| outputFormat \| csv \|`
			`\| pageNumbers \| all \|`
			`When I send the API request to the endpoint "/api/v1/convert/pdf/csv"`
			`Then the response status code should be 200`
			`And the response file should have size greater than 200`
			`And the response file should have extension ".zip"`
			`And the response ZIP should contain 3 files`