Stirling-PDF/devGuide/FILE_HISTORY_SPECIFICATION.md
ConnorYoh 190178a471
Feature/v2/filehistory (#4370)
File History

---------

Co-authored-by: Connor Yoh <connor@stirlingpdf.com>
2025-09-16 15:08:11 +01:00

10 KiB

Stirling PDF File History Specification

Overview

Stirling PDF implements a client-side file history system using IndexedDB storage. File metadata, including version history and tool chains, are stored as StirlingFileStub objects that travel alongside the actual file data. This enables comprehensive version tracking, tool history, and file lineage management without modifying PDF content.

Storage Architecture

IndexedDB-Based Storage

File history is stored in the browser's IndexedDB using the fileStorage service, providing:

  • Persistent storage: Survives browser sessions and page reloads
  • Large capacity: Supports files up to 100GB+ with full metadata
  • Fast queries: Optimized for file browsing and history lookups
  • Type safety: Structured TypeScript interfaces

Core Data Structures

interface StirlingFileStub extends BaseFileMetadata {
  id: FileId;                      // Unique file identifier (UUID)
  quickKey: string;                // Deduplication key: name|size|lastModified
  thumbnailUrl?: string;           // Generated thumbnail blob URL
  processedFile?: ProcessedFileMetadata;  // PDF page data and processing results
  
  // File Metadata
  name: string;
  size: number;
  type: string;
  lastModified: number;
  createdAt: number;
  
  // Version Control
  isLeaf: boolean;                 // True if this is the latest version
  versionNumber?: number;          // Version number (1, 2, 3, etc.)
  originalFileId?: string;         // UUID of the root file in version chain
  parentFileId?: string;           // UUID of immediate parent file
  
  // Tool History
  toolHistory?: ToolOperation[];   // Complete sequence of applied tools
}

interface ToolOperation {
  toolName: string;                // Tool identifier (e.g., 'compress', 'sanitize')
  timestamp: number;               // When the tool was applied
}

interface StoredStirlingFileRecord extends StirlingFileStub {
  data: ArrayBuffer;               // Actual file content
  fileId: FileId;                  // Duplicate for indexing
}

Version Management System

Version Progression

  • v1: Original uploaded file (first version)
  • v2: First tool applied to original
  • v3: Second tool applied (inherits from v2)
  • v4: Third tool applied (inherits from v3)
  • etc.

Leaf Node System

Only the latest version of each file family is marked as isLeaf: true:

  • Leaf files: Show in default file list, available for tool processing
  • History files: Hidden by default, accessible via history expansion

File Relationships

document.pdf (v1, isLeaf: false) 
    ↓ compress
document.pdf (v2, isLeaf: false)
    ↓ sanitize  
document.pdf (v3, isLeaf: true)  ← Current active version

Implementation Architecture

1. FileStorage Service (fileStorage.ts)

Core Methods:

// Store file with complete metadata
async storeStirlingFile(stirlingFile: StirlingFile, stub: StirlingFileStub): Promise<void>

// Load file with metadata
async getStirlingFile(id: FileId): Promise<StirlingFile | null>
async getStirlingFileStub(id: FileId): Promise<StirlingFileStub | null>

// Query operations  
async getLeafStirlingFileStubs(): Promise<StirlingFileStub[]>
async getAllStirlingFileStubs(): Promise<StirlingFileStub[]>

// Version management
async markFileAsProcessed(fileId: FileId): Promise<boolean>  // Set isLeaf = false
async markFileAsLeaf(fileId: FileId): Promise<boolean>       // Set isLeaf = true

2. File Context Integration

FileContext manages runtime state with StirlingFileStub[] in memory:

interface FileContextState {
  files: {
    ids: FileId[];
    byId: Record<FileId, StirlingFileStub>;
  };
}

Key Operations:

  • addFiles(): Stores new files with initial metadata
  • addStirlingFileStubs(): Loads existing files from storage with preserved metadata
  • consumeFiles(): Processes files through tools, creating new versions

3. Tool Operation Integration

Tool Processing Flow:

  1. Input: User selects files (marked as isLeaf: true)
  2. Processing: Backend processes files and returns results
  3. History Creation: New StirlingFileStub created with:
    • Incremented version number
    • Updated tool history
    • Parent file reference
  4. Storage: Both parent (marked isLeaf: false) and child (marked isLeaf: true) stored
  5. UI Update: FileContext updated with new file state

Child Stub Creation:

export function createChildStub(
  parentStub: StirlingFileStub, 
  operation: { toolName: string; timestamp: number }, 
  resultingFile: File, 
  thumbnail?: string
): StirlingFileStub {
  return {
    id: createFileId(),
    name: resultingFile.name,
    size: resultingFile.size,
    type: resultingFile.type,
    lastModified: resultingFile.lastModified,
    quickKey: createQuickKey(resultingFile),
    createdAt: Date.now(),
    isLeaf: true,
    
    // Version Control
    versionNumber: (parentStub.versionNumber || 1) + 1,
    originalFileId: parentStub.originalFileId || parentStub.id,
    parentFileId: parentStub.id,
    
    // Tool History
    toolHistory: [...(parentStub.toolHistory || []), operation],
    thumbnailUrl: thumbnail
  };
}

UI Integration

File Manager History Display

FileManager (FileManager.tsx) provides:

  • Default View: Shows only leaf files (isLeaf: true)
  • History Expansion: Click to show all versions of a file family
  • History Groups: Nested display using FileHistoryGroup.tsx

FileListItem (FileListItem.tsx) displays:

  • Version Badges: v1, v2, v3 indicators
  • Tool Chain: Complete processing history in tooltips
  • History Actions: "Show/Hide History" toggle, "Restore" for history files

FileManagerContext Integration

File Selection Flow:

// Recent files (from storage)
onRecentFileSelect: (stirlingFileStubs: StirlingFileStub[]) => void
// Calls: actions.addStirlingFileStubs(stirlingFileStubs, options)

// New uploads  
onFileUpload: (files: File[]) => void
// Calls: actions.addFiles(files, options)

History Management:

// Toggle history visibility
const { expandedFileIds, onToggleExpansion } = useFileManagerContext();

// Restore history file to current
const handleAddToRecents = (file: StirlingFileStub) => {
  fileStorage.markFileAsLeaf(file.id);  // Make this version current
};

Data Flow

New File Upload

1. User uploads files → addFiles() 
2. Generate thumbnails and page count
3. Create StirlingFileStub with isLeaf: true, versionNumber: 1
4. Store both StirlingFile + StirlingFileStub in IndexedDB
5. Dispatch to FileContext state

Tool Processing

1. User selects tool + files → useToolOperation()
2. API processes files → returns processed File objects
3. createChildStub() for each result:
   - Parent marked isLeaf: false
   - Child created with isLeaf: true, incremented version
4. Store all files with updated metadata
5. Update FileContext with new state

File Loading (Recent Files)

1. User selects from FileManager → onRecentFileSelect()
2. addStirlingFileStubs() with preserved metadata
3. Load actual StirlingFile data from storage  
4. Files appear in workbench with complete history intact

Performance Optimizations

Metadata Regeneration

When loading files from storage, missing processedFile data is regenerated:

// In addStirlingFileStubs()
const needsProcessing = !record.processedFile || 
                        !record.processedFile.pages || 
                        record.processedFile.pages.length === 0;

if (needsProcessing) {
  const result = await generateThumbnailWithMetadata(stirlingFile);
  record.processedFile = createProcessedFile(result.pageCount, result.thumbnail);
}

Memory Management

  • Blob URL Tracking: Automatic cleanup of thumbnail URLs
  • Lazy Loading: Files loaded from storage only when needed
  • LRU Caching: File objects cached in memory with size limits

File Deduplication

QuickKey System

Files are deduplicated using quickKey format:

const quickKey = `${file.name}|${file.size}|${file.lastModified}`;

This prevents duplicate uploads while allowing different versions of the same logical file.

Error Handling

Graceful Degradation

  • Storage Failures: Files continue to work without persistence
  • Metadata Issues: Missing metadata regenerated on demand
  • Version Conflicts: Automatic version number resolution

Recovery Scenarios

  • Corrupted Storage: Automatic cleanup and re-initialization
  • Missing Files: Stubs cleaned up automatically
  • Version Mismatches: Automatic version chain reconstruction

Developer Guidelines

Adding File History to New Components

  1. Use FileContext Actions:
const { actions } = useFileActions();
await actions.addFiles(files);  // For new uploads
await actions.addStirlingFileStubs(stubs);  // For existing files
  1. Preserve Metadata When Processing:
const childStub = createChildStub(parentStub, {
  toolName: 'compress',
  timestamp: Date.now()
}, processedFile, thumbnail);
  1. Handle Storage Operations:
await fileStorage.storeStirlingFile(stirlingFile, stirlingFileStub);
const stub = await fileStorage.getStirlingFileStub(fileId);

Testing File History

  1. Upload files: Should show v1, marked as leaf
  2. Apply tool: Should create v2, mark v1 as non-leaf
  3. Check FileManager: History should show both versions
  4. Restore old version: Should mark old version as leaf
  5. Check storage: Both versions should persist in IndexedDB

Future Enhancements

Potential Improvements

  • Branch History: Support for parallel processing branches
  • History Export: Export complete version history as JSON
  • Conflict Resolution: Handle concurrent modifications
  • Cloud Sync: Sync history across devices
  • Compression: Compress historical file data

API Extensions

  • Batch Operations: Process multiple version chains simultaneously
  • Search Integration: Search within tool history and file metadata
  • Analytics: Track usage patterns and tool effectiveness

Last Updated: January 2025
Implementation: Stirling PDF Frontend v2
Storage Version: IndexedDB with fileStorage service