Skip to content

Conversation

@natinew77-creator
Copy link

What?

This PR fixes issue #951 - PDFs with incremental updates become corrupted after loading and saving with pdf-lib.

How?

Root Cause

The issue occurs because PDFContext.enumerateIndirectObjects() always sorts indirect objects by object number before returning them. This is problematic for PDFs that use the "Incremental Update" feature of the PDF format.

When a PDF is loaded and saved without adding new content, the objects should maintain their original order. Reordering them invalidates the byte offsets in the XRef table, causing PDF viewers to look for objects at incorrect locations - resulting in garbled text or missing content.

Solution

Introduce a needsReordering flag in PDFContext that tracks whether new objects have been registered:

  1. register() method: Sets needsReordering = true when a NEW object is created
  2. enumerateIndirectObjects() method: Only sorts objects when needsReordering is true

This ensures:

  • PDFs with only modifications (no new objects) maintain their original parsing order
  • PDFs with new content added (pages, images, etc.) are properly sorted to integrate new objects

Code Changes

src/core/PDFContext.ts:

  • Added needsReordering: boolean private property (initialized to false)
  • Modified register() to set needsReordering = true
  • Modified enumerateIndirectObjects() to conditionally sort based on the flag

Testing

  • Added comprehensive unit tests for PDFContext ordering behavior
  • Added integration tests with a real PDF from the issue
  • All 625 existing tests pass
  • Visually verified that the output PDF renders correctly with readable text

Related Issues

Copilot AI review requested due to automatic review settings December 25, 2025 23:59
Copy link

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR addresses issue #951 where PDFs with incremental updates become corrupted after being loaded and saved with pdf-lib. The fix introduces a needsReordering flag to conditionally preserve object order when no new objects are added, preventing corruption of byte offsets in the XRef table.

Key changes:

  • Added a needsReordering flag to track when new objects are registered
  • Modified enumerateIndirectObjects() to conditionally sort based on the flag
  • Added comprehensive unit and integration tests for the ordering behavior

Reviewed changes

Copilot reviewed 2 out of 3 changed files in this pull request and generated 2 comments.

File Description
src/core/PDFContext.ts Adds needsReordering flag and conditional sorting logic in enumerateIndirectObjects() method
tests/core/PDFContext_ordering.spec.ts Adds unit tests for object ordering behavior and integration tests with real PDFs

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

@natinew77-creator natinew77-creator force-pushed the fix/issue-951-corrupted-pdf-incremental-updates branch 2 times, most recently from 851ee12 to d2fd990 Compare December 26, 2025 00:16
@natinew77-creator
Copy link
Author

Hi @Hopding, I've completed the fix for issue #951 regarding PDF corruption during incremental updates. All 625 existing tests pass, and I've added new unit and integration tests to cover this scenario. Could you please take a look when you have a moment?

@natinew77-creator natinew77-creator force-pushed the fix/issue-951-corrupted-pdf-incremental-updates branch from d2fd990 to dd76d4d Compare December 26, 2025 23:49
…#951)

This fixes a critical issue where PDFs using the Incremental Update feature
would become corrupted after loading and saving with pdf-lib.

## Problem

When loading and saving PDFs without adding new content, the library would
always sort indirect objects by object number during enumeration. This broke
PDFs with incremental updates because the XRef byte offsets became invalid
when objects were reordered.

## Solution

Introduce a 'needsReordering' flag in PDFContext that tracks whether new
objects have been created:

- nextRef() sets needsReordering = true when a new object number is allocated
  (this covers both register() and the nextRef() + assign() pattern used by
  embedders like JavaScriptEmbedder, CustomFontEmbedder, etc.)
- enumerateIndirectObjects() only sorts objects when needsReordering is true
- If only existing objects are modified (no new objects added), the original
  parsing order is preserved

This ensures backward compatibility - PDFs with new content are still properly
sorted, while PDFs with only modifications maintain their original structure.

## Changes

- src/core/PDFContext.ts: Add needsReordering flag and conditional sorting
- tests/core/PDFContext_ordering.spec.ts: Add unit and integration tests
  (including test for nextRef() + assign() embedder pattern and largestObjectNumber tracking)
  Added 30s timeout for integration tests with large PDF files
- assets/pdfs/with_incremental_updates.pdf: Test PDF file for the issue

Fixes Hopding#951
@natinew77-creator natinew77-creator force-pushed the fix/issue-951-corrupted-pdf-incremental-updates branch from dd76d4d to 69998cb Compare December 26, 2025 23:55
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Corrupted PDF

1 participant