Balázs Szücs d23c2eaa30
feat: Auto-redact to support text removal on True PDFs/non-custom encoded PDFs, JUnit tests for RedactController, and TextFinder (#3936)
# Description of Changes

## Overview

This enhancement adds **true PDF text removal** to RedactController. It
changes auto-redaction from visual covering to actual text removal. The
feature removes text from True PDFs completely while keeping
compatibility with other PDF types.

## Features

### 1. True PDF Text Removal

- Removes text from PDF structure instead of just hiding it
- No impact to manual redaction or other types of PDFs (e.g.: to
searchable PDFs or custom encoded PDFs)

### 2. Advanced Content Stream Processing

#### How It Works (only high level overview)
- Token Processing: Breaks PDF content into small pieces for exact text
finding
- Font Tracking: Keeps track of fonts and formatting
- Text Operators: Finds PDF commands that show text (`Tj`, `TJ`, `'`,
`"`)
- Position Mapping: Maps text to exact locations for removal
- Rebuilds PDF: Rebuilds PDFs without the text, while keeping formatting
operators

#### No change for other types PDFs

- Because the iteration through the PDF for token/text removal and for
box placing are two separate completely methods
- This means when the there is custom encoded PDF the token/text removal
won't find any text to remove (because there is no logic for decoding
for, for now) but the box finding methods still reliably finds redacted
words and puts a box onto them. So no change.

### 3. Enhanced TextFinder Integration

#### Minor Improvements
- Page Grouping: Groups found text by page for faster processing

### JUnit tests for both of files.

- Added JUnit tests for both files. 
- Might need future improvement.

### TODOs

- Support for additional PDF types besides true PDFs (currently a WIP),
e.g.: searchable PDF/custom encoded PDF
- Feature to be expected in few weeks (best case scenario, and only if I
succeed), sadly that is significantly harder task so only true PDFs for
now

### UI

- No UI change for now

### Sample files:


[Free_Test_Data_500KB_PDF_redacted.pdf](https://github.com/user-attachments/files/21195841/Free_Test_Data_500KB_PDF_redacted.pdf)

[lorem-ipsum_redacted.pdf](https://github.com/user-attachments/files/21195842/lorem-ipsum_redacted.pdf)

[true-pdf-sample-1_redacted.pdf](https://github.com/user-attachments/files/21195843/true-pdf-sample-1_redacted.pdf)

[true-pdf-sample-2_redacted.pdf](https://github.com/user-attachments/files/21195844/true-pdf-sample-2_redacted.pdf)

[true-pdf-sample-3_redacted.pdf](https://github.com/user-attachments/files/21195845/true-pdf-sample-3_redacted.pdf)


Closes: does not actually close any issues, since it only works with
true PDFs

---

## Checklist

### General

- [x] I have read the [Contribution
Guidelines](https://github.com/Stirling-Tools/Stirling-PDF/blob/main/CONTRIBUTING.md)
- [x] I have read the [Stirling-PDF Developer
Guide](https://github.com/Stirling-Tools/Stirling-PDF/blob/main/devGuide/DeveloperGuide.md)
(if applicable)
- [ ] I have read the [How to add new languages to
Stirling-PDF](https://github.com/Stirling-Tools/Stirling-PDF/blob/main/devGuide/HowToAddNewLanguage.md)
(if applicable)
- [x] I have performed a self-review of my own code
- [x] My changes generate no new warnings

### Documentation

- [ ] I have updated relevant docs on [Stirling-PDF's doc
repo](https://github.com/Stirling-Tools/Stirling-Tools.github.io/blob/main/docs/)
(if functionality has heavily changed)
- [ ] I have read the section [Add New Translation
Tags](https://github.com/Stirling-Tools/Stirling-PDF/blob/main/devGuide/HowToAddNewLanguage.md#add-new-translation-tags)
(for new translation tags only)

### UI Changes (if applicable)

- [ ] Screenshots or videos demonstrating the UI changes are attached
(e.g., as comments or direct attachments in the PR)

### Testing (if applicable)

- [x] I have tested my changes locally. Refer to the [Testing
Guide](https://github.com/Stirling-Tools/Stirling-PDF/blob/main/devGuide/DeveloperGuide.md#6-testing)
for more details.

---------

Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>
Co-authored-by: Anthony Stirling <77850077+Frooodle@users.noreply.github.com>
2025-08-13 22:52:06 +01:00
..