mirror of
https://github.com/Stirling-Tools/Stirling-PDF.git
synced 2025-08-21 19:59:24 +00:00

# Description of Changes ## Overview This enhancement adds **true PDF text removal** to RedactController. It changes auto-redaction from visual covering to actual text removal. The feature removes text from True PDFs completely while keeping compatibility with other PDF types. ## Features ### 1. True PDF Text Removal - Removes text from PDF structure instead of just hiding it - No impact to manual redaction or other types of PDFs (e.g.: to searchable PDFs or custom encoded PDFs) ### 2. Advanced Content Stream Processing #### How It Works (only high level overview) - Token Processing: Breaks PDF content into small pieces for exact text finding - Font Tracking: Keeps track of fonts and formatting - Text Operators: Finds PDF commands that show text (`Tj`, `TJ`, `'`, `"`) - Position Mapping: Maps text to exact locations for removal - Rebuilds PDF: Rebuilds PDFs without the text, while keeping formatting operators #### No change for other types PDFs - Because the iteration through the PDF for token/text removal and for box placing are two separate completely methods - This means when the there is custom encoded PDF the token/text removal won't find any text to remove (because there is no logic for decoding for, for now) but the box finding methods still reliably finds redacted words and puts a box onto them. So no change. ### 3. Enhanced TextFinder Integration #### Minor Improvements - Page Grouping: Groups found text by page for faster processing ### JUnit tests for both of files. - Added JUnit tests for both files. - Might need future improvement. ### TODOs - Support for additional PDF types besides true PDFs (currently a WIP), e.g.: searchable PDF/custom encoded PDF - Feature to be expected in few weeks (best case scenario, and only if I succeed), sadly that is significantly harder task so only true PDFs for now ### UI - No UI change for now ### Sample files: [Free_Test_Data_500KB_PDF_redacted.pdf](https://github.com/user-attachments/files/21195841/Free_Test_Data_500KB_PDF_redacted.pdf) [lorem-ipsum_redacted.pdf](https://github.com/user-attachments/files/21195842/lorem-ipsum_redacted.pdf) [true-pdf-sample-1_redacted.pdf](https://github.com/user-attachments/files/21195843/true-pdf-sample-1_redacted.pdf) [true-pdf-sample-2_redacted.pdf](https://github.com/user-attachments/files/21195844/true-pdf-sample-2_redacted.pdf) [true-pdf-sample-3_redacted.pdf](https://github.com/user-attachments/files/21195845/true-pdf-sample-3_redacted.pdf) Closes: does not actually close any issues, since it only works with true PDFs --- ## Checklist ### General - [x] I have read the [Contribution Guidelines](https://github.com/Stirling-Tools/Stirling-PDF/blob/main/CONTRIBUTING.md) - [x] I have read the [Stirling-PDF Developer Guide](https://github.com/Stirling-Tools/Stirling-PDF/blob/main/devGuide/DeveloperGuide.md) (if applicable) - [ ] I have read the [How to add new languages to Stirling-PDF](https://github.com/Stirling-Tools/Stirling-PDF/blob/main/devGuide/HowToAddNewLanguage.md) (if applicable) - [x] I have performed a self-review of my own code - [x] My changes generate no new warnings ### Documentation - [ ] I have updated relevant docs on [Stirling-PDF's doc repo](https://github.com/Stirling-Tools/Stirling-Tools.github.io/blob/main/docs/) (if functionality has heavily changed) - [ ] I have read the section [Add New Translation Tags](https://github.com/Stirling-Tools/Stirling-PDF/blob/main/devGuide/HowToAddNewLanguage.md#add-new-translation-tags) (for new translation tags only) ### UI Changes (if applicable) - [ ] Screenshots or videos demonstrating the UI changes are attached (e.g., as comments or direct attachments in the PR) ### Testing (if applicable) - [x] I have tested my changes locally. Refer to the [Testing Guide](https://github.com/Stirling-Tools/Stirling-PDF/blob/main/devGuide/DeveloperGuide.md#6-testing) for more details. --------- Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com> Co-authored-by: Anthony Stirling <77850077+Frooodle@users.noreply.github.com>