Ludy 03f184ab2b
chore(cucumber): add create_pdf_with_black_boxes and convert-pdf-to-image outline; remove duplicate split-pdf-by-sections (#3937)
# Description of Changes

- **What was changed**  
- Introduced `create_pdf_with_black_boxes` helper function in
`environment.py` for generating test PDFs with occluded content.
- Added **Scenario Outline: Convert PDF to image** to
`conversion.feature` to validate PDF→image conversion workflows.
- Removed the duplicate **Scenario Outline: split-pdf-by-sections with
different parameters** from `general.feature`.

- **Why the change was made**  
- To enable testing of blacked-out content scenarios and ensure our
suite covers image conversion.
- To eliminate redundant tests and keep the feature files DRY and
maintainable.

---

## Checklist

### General

- [x] I have read the [Contribution
Guidelines](https://github.com/Stirling-Tools/Stirling-PDF/blob/main/CONTRIBUTING.md)
- [x] I have read the [Stirling-PDF Developer
Guide](https://github.com/Stirling-Tools/Stirling-PDF/blob/main/devGuide/DeveloperGuide.md)
(if applicable)
- [ ] I have read the [How to add new languages to
Stirling-PDF](https://github.com/Stirling-Tools/Stirling-PDF/blob/main/devGuide/HowToAddNewLanguage.md)
(if applicable)
- [x] I have performed a self-review of my own code
- [x] My changes generate no new warnings

### Documentation

- [ ] I have updated relevant docs on [Stirling-PDF's doc
repo](https://github.com/Stirling-Tools/Stirling-Tools.github.io/blob/main/docs/)
(if functionality has heavily changed)
- [ ] I have read the section [Add New Translation
Tags](https://github.com/Stirling-Tools/Stirling-PDF/blob/main/devGuide/HowToAddNewLanguage.md#add-new-translation-tags)
(for new translation tags only)

### UI Changes (if applicable)

- [ ] Screenshots or videos demonstrating the UI changes are attached
(e.g., as comments or direct attachments in the PR)

### Testing (if applicable)

- [x] I have tested my changes locally. Refer to the [Testing
Guide](https://github.com/Stirling-Tools/Stirling-PDF/blob/main/devGuide/DeveloperGuide.md#6-testing)
for more details.
2025-07-14 12:05:17 +01:00

251 lines
10 KiB
Gherkin

Feature: API Validation
@libre @positive
Scenario: Repair PDF
Given I generate a PDF file as "fileInput"
When I send the API request to the endpoint "/api/v1/misc/repair"
Then the response content type should be "application/pdf"
And the response file should have size greater than 0
And the response status code should be 200
@ocr @positive
Scenario: Process PDF with OCR
Given I generate a PDF file as "fileInput"
And the request data includes
| parameter | value |
| languages | eng |
| sidecar | false |
| deskew | true |
| clean | true |
| cleanFinal | true |
| ocrType | Normal |
| ocrRenderType | hocr |
| removeImagesAfter | false |
When I send the API request to the endpoint "/api/v1/misc/ocr-pdf"
Then the response content type should be "application/pdf"
And the response file should have size greater than 0
And the response status code should be 200
@ocr @positive
Scenario: Extract Image Scans
Given I generate a PDF file as "fileInput"
And the pdf contains 3 images of size 300x300 on 2 pages
And the request data includes
| parameter | value |
| angleThreshold | 5 |
| tolerance | 20 |
| minArea | 8000 |
| minContourArea | 500 |
| borderSize | 1 |
When I send the API request to the endpoint "/api/v1/misc/extract-image-scans"
Then the response content type should be "application/octet-stream"
And the response file should have extension ".zip"
And the response ZIP should contain 2 files
And the response file should have size greater than 0
And the response status code should be 200
@ocr @positive
Scenario: Process PDF with OCR
Given I generate a PDF file as "fileInput"
And the request data includes
| parameter | value |
| languages | eng |
| sidecar | false |
| deskew | true |
| clean | true |
| cleanFinal | true |
| ocrType | Force |
| ocrRenderType | hocr |
| removeImagesAfter | false |
When I send the API request to the endpoint "/api/v1/misc/ocr-pdf"
Then the response content type should be "application/pdf"
And the response file should have size greater than 0
And the response status code should be 200
@libre @positive
Scenario Outline: Convert PDF to various word formats
Given I generate a PDF file as "fileInput"
And the pdf contains 3 pages with random text
And the request data includes
| parameter | value |
| outputFormat | <format> |
When I send the API request to the endpoint "/api/v1/convert/pdf/word"
Then the response status code should be 200
And the response file should have size greater than 100
And the response file should have extension "<extension>"
Examples:
| format | extension |
| docx | .docx |
| odt | .odt |
| doc | .doc |
@ocr @pdfa1
Scenario: PDFA
Given I use an example file at "exampleFiles/pdfa2.pdf" as parameter "fileInput"
And the request data includes
| parameter | value |
| outputFormat | pdfa |
When I send the API request to the endpoint "/api/v1/convert/pdf/pdfa"
Then the response status code should be 200
And the response file should have extension ".pdf"
And the response file should have size greater than 100
@ocr @pdfa2
Scenario: PDFA1
Given I use an example file at "exampleFiles/pdfa1.pdf" as parameter "fileInput"
And the request data includes
| parameter | value |
| outputFormat | pdfa-1 |
When I send the API request to the endpoint "/api/v1/convert/pdf/pdfa"
Then the response status code should be 200
And the response file should have extension ".pdf"
And the response file should have size greater than 100
@compress @qpdf @positive
Scenario: Compress
Given I use an example file at "exampleFiles/ghost3.pdf" as parameter "fileInput"
And the request data includes
| parameter | value |
| optimizeLevel | 4 |
When I send the API request to the endpoint "/api/v1/misc/compress-pdf"
Then the response status code should be 200
And the response file should have extension ".pdf"
And the response file should have size greater than 100
@compress @qpdf @positive
Scenario: Compress
Given I use an example file at "exampleFiles/ghost2.pdf" as parameter "fileInput"
And the request data includes
| parameter | value |
| optimizeLevel | 1 |
| expectedOutputSize | 5KB |
When I send the API request to the endpoint "/api/v1/misc/compress-pdf"
Then the response status code should be 200
And the response file should have extension ".pdf"
And the response file should have size greater than 100
@compress @qpdf @positive
Scenario: Compress
Given I use an example file at "exampleFiles/ghost1.pdf" as parameter "fileInput"
And the request data includes
| parameter | value |
| optimizeLevel | 1 |
| expectedOutputSize | 5KB |
When I send the API request to the endpoint "/api/v1/misc/compress-pdf"
Then the response status code should be 200
And the response file should have extension ".pdf"
And the response file should have size greater than 100
@libre @positive
Scenario Outline: Convert PDF to various types
Given I generate a PDF file as "fileInput"
And the pdf contains 3 pages with random text
And the request data includes
| parameter | value |
| outputFormat | <format> |
When I send the API request to the endpoint "/api/v1/convert/pdf/<type>"
Then the response status code should be 200
And the response file should have size greater than 100
And the response file should have extension "<extension>"
Examples:
| type | format | extension |
| text | rtf | .rtf |
| text | txt | .txt |
| presentation | ppt | .ppt |
| presentation | pptx | .pptx |
| presentation | odp | .odp |
| html | html | .zip |
@image @positive
Scenario Outline: Convert PDF to image
Given I generate a PDF file as "fileInput"
And the pdf contains 3 pages with random text
And the pdf contains 3 images of size 300x300 on 3 pages
And the request data includes
| parameter | value |
| dpi | 300 |
| imageFormat | <format> |
When I send the API request to the endpoint "/api/v1/convert/pdf/img"
Then the response status code should be 200
And the response file should have size greater than 100
And the response file should have extension ".zip"
Examples:
| format |
| webp |
| png |
| jpeg |
| jpg |
| gif |
@libre @positive @topdf
Scenario Outline: Convert PDF to various types
Given I use an example file at "exampleFiles/example<extension>" as parameter "fileInput"
When I send the API request to the endpoint "/api/v1/convert/file/pdf"
Then the response status code should be 200
And the response file should have size greater than 100
And the response file should have extension ".pdf"
Examples:
| extension |
| .docx |
| .odp |
| .odt |
| .pptx |
| .rtf |
@calibre @positive @htmltopdf
Scenario: Convert HTML to PDF
Given I use an example file at "exampleFiles/example.html" as parameter "fileInput"
When I send the API request to the endpoint "/api/v1/convert/html/pdf"
Then the response status code should be 200
And the response file should have size greater than 100
And the response file should have extension ".pdf"
@calibre @positive @zippedhtmltopdf
Scenario: Convert zipped HTML to PDF
Given I use an example file at "exampleFiles/example_html.zip" as parameter "fileInput"
When I send the API request to the endpoint "/api/v1/convert/html/pdf"
Then the response status code should be 200
And the response file should have size greater than 100
And the response file should have extension ".pdf"
@calibre @positive @markdowntopdf
Scenario: Convert Markdown to PDF
Given I use an example file at "exampleFiles/example.md" as parameter "fileInput"
When I send the API request to the endpoint "/api/v1/convert/markdown/pdf"
Then the response status code should be 200
And the response file should have size greater than 100
And the response file should have extension ".pdf"
@markdown @positive
Scenario: Convert PDF to Markdown format
Given I generate a PDF file as "fileInput"
And the pdf contains 3 pages with random text
When I send the API request to the endpoint "/api/v1/convert/pdf/markdown"
Then the response status code should be 200
And the response file should have size greater than 100
And the response file should have extension ".md"
@positive @pdftocsv
Scenario: Convert PDF with tables to CSV format
Given I use an example file at "exampleFiles/tables.pdf" as parameter "fileInput"
And the request data includes
| parameter | value |
| outputFormat | csv |
| pageNumbers | all |
When I send the API request to the endpoint "/api/v1/convert/pdf/csv"
Then the response status code should be 200
And the response file should have size greater than 200
And the response file should have extension ".zip"
And the response ZIP should contain 3 files