Skip to content

[Bug]: PDF compress doesn't work properly if target PDF file is very large #3489

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
1 task done
mesmeriz2 opened this issue May 7, 2025 · 1 comment
Open
1 task done
Labels
Back End Issues related to back-end development

Comments

@mesmeriz2
Copy link

Installation Method

Docker

The Problem

When I try to compress PDF files of which size is more than 200MB, Stirling PDF shows an error message that "HTTP error! status : 504"
PDF file consists of 40 pages of images, and images on each page are very large(6500x2600)

I look into logs and found out that internal process that compresses PDF file goes well initially
and several messages appear occurs at the end stage of process : "operation succeeded with warnings"

I also found that system memory usage is very high when the error occurs (more than 70% with 12GB RAM)

Version of Stirling-PDF

0.46

Last Working Version of Stirling-PDF

No response

Page Where the Problem Occurred

No response

Docker Configuration

version: '3.3'
services:
  stirling-pdf:
    image: docker.stirlingpdf.com/stirlingtools/stirling-pdf:latest
    ports:
      - '8080:8080'
    volumes:      
      - ./extraConfigs:/configs
      - ./customFiles:/customFiles/
      - ./logs:/logs/
      - ./pipeline:/pipeline/
    environment:
      - DOCKER_ENABLE_SECURITY=false
      - LANGS=ko_KR

Relevant Log Output

date	stream	content
2025/05/07 16:42:26	stdout	07:42:26.339 [scheduling-1] INFO  s.s.S.s.MetricsAggregatorService - http_requests_POST__api_v1_settings_update-enable-analytics, 1.0
2025/05/07 16:42:26	stdout	07:42:26.338 [scheduling-1] INFO  s.s.S.s.MetricsAggregatorService - http_requests_POST__api_v1_misc_compress-pdf, 1.0
2025/05/07 16:42:26	stdout	07:42:26.338 [scheduling-1] INFO  s.s.S.s.MetricsAggregatorService - http_requests_GET__compress-pdf, 1.0
2025/05/07 14:47:02	stdout	05:47:02.262 [qtp270261532-53] INFO  o.a.p.e.u.DeletingRandomAccessFile - Successfully deleted temp file: /tmp/compressedPDF8457849069995046661.pdf
2025/05/07 14:47:01	stdout	05:47:01.920 [qtp270261532-53] INFO  s.s.S.c.api.misc.CompressController - Post-QPDF file size: 134.99 MB (reduced by -0.0%)
2025/05/07 14:47:01	stdout	qpdf: operation succeeded with warnings; resulting file may have some problems
2025/05/07 14:47:01	stdout	05:47:01.887 [qtp270261532-53] WARN  s.s.SPDF.utils.ProcessExecutor - qpdf succeeded with warnings: WARNING: /tmp/compressedPDF8457849069995046661.pdf: reported number of objects (228) is not one plus the highest object number (226)
2025/05/07 14:47:01	stdout	qpdf: operation succeeded with warnings; resulting file may have some problems
2025/05/07 14:47:01	stdout	05:47:01.887 [qtp270261532-53] WARN  s.s.SPDF.utils.ProcessExecutor - qpdf succeeded with warnings: WARNING: /tmp/compressedPDF8457849069995046661.pdf: reported number of objects (228) is not one plus the highest object number (226)
2025/05/07 14:47:01	stdout	05:47:01.885 [Thread-24] INFO  s.s.SPDF.utils.ProcessExecutor - qpdf: operation succeeded with warnings; resulting file may have some problems
2025/05/07 14:47:01	stdout	05:47:01.301 [Thread-24] INFO  s.s.SPDF.utils.ProcessExecutor - WARNING: /tmp/compressedPDF8457849069995046661.pdf: reported number of objects (228) is not one plus the highest object number (226)
2025/05/07 14:47:01	stdout	05:47:01.261 [qtp270261532-53] INFO  s.s.SPDF.utils.ProcessExecutor - Running command: qpdf --recompress-flate --compression-level=9 --compress-streams=y --object-streams=generate /tmp/compressedPDF8457849069995046661.pdf /tmp/qpdf_output_12306025012598080097.pdf
2025/05/07 14:47:01	stdout	05:47:01.251 [qtp270261532-53] INFO  s.s.S.c.api.misc.CompressController - Pre-QPDF file size: 134.99 MB
2025/05/07 14:47:01	stdout	05:47:01.250 [qtp270261532-53] INFO  o.a.p.e.u.DeletingRandomAccessFile - Successfully deleted temp file: /tmp/working_83300047840938827.pdf
2025/05/07 14:47:01	stdout	05:47:01.228 [qtp270261532-53] INFO  s.s.S.c.api.misc.CompressController - Overall PDF compression: 277.70 MB → 134.99 MB (reduced by 51.4%)
2025/05/07 14:47:00	stdout	05:47:00.484 [qtp270261532-53] INFO  s.s.S.c.api.misc.CompressController - Saving compressed PDF to /tmp/compressedPDF8457849069995046661.pdf
2025/05/07 14:47:00	stdout	05:47:00.484 [qtp270261532-53] INFO  s.s.S.c.api.misc.CompressController - Total original image size: 277.68 MB, compressed: 134.97 MB (reduced by 51.4%)
2025/05/07 14:47:00	stdout	05:47:00.483 [qtp270261532-53] INFO  s.s.S.c.api.misc.CompressController - Image compression summary - Total unique: 42, Compressed: 42, Skipped: 0, Duplicates: 0, Nested: 0

Additional Information

No response

Browsers Affected

No response

No Duplicate of the Issue

  • I have verified that there are no existing issues raised related to my problem.
Copy link

dosubot bot commented May 7, 2025

Hello @mesmeriz2! I'm here to help you solve bugs, answer questions, and guide you on your journey to becoming a contributor. While you wait for a human maintainer, feel free to share any details you need assistance with!

The HTTP 504 error you're encountering when compressing large PDF files over 200MB in Stirling PDF version 0.46 is likely due to a proxy timeout issue. This happens when the compression process takes longer than the proxy's timeout limit. To resolve this, you can increase the timeout settings in your proxy configuration (such as nginx) to allow more time for the compression process to complete [1].

Additionally, the high memory usage you observed suggests that the compression process is resource-intensive, especially with large image-based PDFs. You might want to ensure that your system has sufficient resources to handle such operations, or consider optimizing the images before compression to reduce memory usage.

To continue talking to Dosu, mention @dosu.


Help Dosu learn! Give it feedback: Great Response | Irrelevant Answer | Incorrect Sources | Too Verbose | Hallucination | Bug Report | Other

@dosubot dosubot bot added the Back End Issues related to back-end development label May 7, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Back End Issues related to back-end development
Projects
None yet
Development

No branches or pull requests

1 participant