-
-
Notifications
You must be signed in to change notification settings - Fork 838
fix: Preserve object order for PDFs with incremental updates (#951) #1769
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: master
Are you sure you want to change the base?
fix: Preserve object order for PDFs with incremental updates (#951) #1769
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Pull request overview
This PR addresses issue #951 where PDFs with incremental updates become corrupted after being loaded and saved with pdf-lib. The fix introduces a needsReordering flag to conditionally preserve object order when no new objects are added, preventing corruption of byte offsets in the XRef table.
Key changes:
- Added a
needsReorderingflag to track when new objects are registered - Modified
enumerateIndirectObjects()to conditionally sort based on the flag - Added comprehensive unit and integration tests for the ordering behavior
Reviewed changes
Copilot reviewed 2 out of 3 changed files in this pull request and generated 2 comments.
| File | Description |
|---|---|
| src/core/PDFContext.ts | Adds needsReordering flag and conditional sorting logic in enumerateIndirectObjects() method |
| tests/core/PDFContext_ordering.spec.ts | Adds unit tests for object ordering behavior and integration tests with real PDFs |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
851ee12 to
d2fd990
Compare
|
Hi @Hopding, I've completed the fix for issue #951 regarding PDF corruption during incremental updates. All 625 existing tests pass, and I've added new unit and integration tests to cover this scenario. Could you please take a look when you have a moment? |
d2fd990 to
dd76d4d
Compare
…#951) This fixes a critical issue where PDFs using the Incremental Update feature would become corrupted after loading and saving with pdf-lib. ## Problem When loading and saving PDFs without adding new content, the library would always sort indirect objects by object number during enumeration. This broke PDFs with incremental updates because the XRef byte offsets became invalid when objects were reordered. ## Solution Introduce a 'needsReordering' flag in PDFContext that tracks whether new objects have been created: - nextRef() sets needsReordering = true when a new object number is allocated (this covers both register() and the nextRef() + assign() pattern used by embedders like JavaScriptEmbedder, CustomFontEmbedder, etc.) - enumerateIndirectObjects() only sorts objects when needsReordering is true - If only existing objects are modified (no new objects added), the original parsing order is preserved This ensures backward compatibility - PDFs with new content are still properly sorted, while PDFs with only modifications maintain their original structure. ## Changes - src/core/PDFContext.ts: Add needsReordering flag and conditional sorting - tests/core/PDFContext_ordering.spec.ts: Add unit and integration tests (including test for nextRef() + assign() embedder pattern and largestObjectNumber tracking) Added 30s timeout for integration tests with large PDF files - assets/pdfs/with_incremental_updates.pdf: Test PDF file for the issue Fixes Hopding#951
dd76d4d to
69998cb
Compare
What?
This PR fixes issue #951 - PDFs with incremental updates become corrupted after loading and saving with pdf-lib.
How?
Root Cause
The issue occurs because
PDFContext.enumerateIndirectObjects()always sorts indirect objects by object number before returning them. This is problematic for PDFs that use the "Incremental Update" feature of the PDF format.When a PDF is loaded and saved without adding new content, the objects should maintain their original order. Reordering them invalidates the byte offsets in the XRef table, causing PDF viewers to look for objects at incorrect locations - resulting in garbled text or missing content.
Solution
Introduce a
needsReorderingflag inPDFContextthat tracks whether new objects have been registered:register()method: SetsneedsReordering = truewhen a NEW object is createdenumerateIndirectObjects()method: Only sorts objects whenneedsReorderingis trueThis ensures:
Code Changes
src/core/PDFContext.ts:needsReordering: booleanprivate property (initialized tofalse)register()to setneedsReordering = trueenumerateIndirectObjects()to conditionally sort based on the flagTesting
PDFContextordering behaviorRelated Issues