-
Notifications
You must be signed in to change notification settings - Fork 5.9k
feat(pdf-conversion): add support for PDF/A-3b, PDF/X formats improve current PDF/A conversion #4844
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
… current PDF/A conversion - Updated `PdfToPdfARequest` to include PDF/X in supported output formats - Expanded input handling and model validation for PDF/A and PDF/X - Added Ghostscript as a preferred backend for PDF/A and PDF/X conversions - Implemented PDF/X-specific conversion logic with detailed validation - Updated UI templates to separate PDF/A and PDF/X format options - Enhanced error handling and warnings during conversion processes - Revised localized strings to reflect expanded functionality Signed-off-by: Balázs Szücs <bszucs1209@gmail.com>
🚀 Translation Verification Summary🔄 Reference Branch:
|
- Resolved inconsistencies in messages_en_US and messages_en_GB property files - Updated descriptions, tags, and credits for PDF/A and PDF/X conversion - Adjusted localization strings to ensure proper format support handling Signed-off-by: Balázs Szücs <bszucs1209@gmail.com>
- Replaced redundant streams and lists initialization with more efficient alternatives - Centralized stream reading logic to prevent reuse issues and ensure proper closing - Enhanced logging for PDF/A validation to differentiate warnings from errors - Simplified methods by removing redundant parameters and improving clarity - Updated GregorianCalendar usage to modern java.time classes - Ensured static state for utility-like methods for cleaner invocation - Improved PDF/A metadata handling by aligning structure and removing redundancy Signed-off-by: Balázs Szücs <bszucs1209@gmail.com>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Pull Request Overview
This PR enhances the PDF conversion functionality by adding support for PDF/X formats alongside existing PDF/A support, and introduces Ghostscript as the preferred conversion method with PDFBox/LibreOffice as a fallback.
- Added PDF/X-1, PDF/X-3, and PDF/X-4 conversion options for print production workflows
- Implemented Ghostscript-based conversion with automatic fallback to PDFBox/LibreOffice
- Refactored code to use enums (PdfaProfile, PdfXProfile) for better type safety and maintainability
Reviewed Changes
Copilot reviewed 4 out of 4 changed files in this pull request and generated 9 comments.
| File | Description |
|---|---|
| pdf-to-pdfa.html | Updated UI to display PDF/A and PDF/X format options in grouped dropdowns with default selection |
| messages_en_GB.properties | Updated strings to reflect PDF/A & PDF/X support and revised conversion method descriptions |
| PdfToPdfARequest.java | Extended allowable format values to include new PDF/A variants and PDF/X formats |
| ConvertPDFToPDFA.java | Major refactoring: added Ghostscript conversion methods, profile enums, validation logic, and split conversion handling for PDF/A vs PDF/X |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
app/core/src/main/java/stirling/software/SPDF/controller/api/converters/ConvertPDFToPDFA.java
Show resolved
Hide resolved
app/core/src/main/java/stirling/software/SPDF/controller/api/converters/ConvertPDFToPDFA.java
Show resolved
Hide resolved
app/core/src/main/java/stirling/software/SPDF/controller/api/converters/ConvertPDFToPDFA.java
Show resolved
Hide resolved
app/core/src/main/java/stirling/software/SPDF/controller/api/converters/ConvertPDFToPDFA.java
Show resolved
Hide resolved
app/core/src/main/java/stirling/software/SPDF/controller/api/converters/ConvertPDFToPDFA.java
Show resolved
Hide resolved
app/core/src/main/java/stirling/software/SPDF/controller/api/converters/ConvertPDFToPDFA.java
Show resolved
Hide resolved
app/core/src/main/java/stirling/software/SPDF/controller/api/converters/ConvertPDFToPDFA.java
Outdated
Show resolved
Hide resolved
app/core/src/main/java/stirling/software/SPDF/controller/api/converters/ConvertPDFToPDFA.java
Outdated
Show resolved
Hide resolved
… validation - Added null checks for PDResources to prevent possible null pointer exceptions - Streamlined request token processing by normalizing input directly in initialization - Simplified filter logic for profile matching by leveraging pre-normalized tokens - Removed redundant option from the PDF/A format dropdown in the UI template Signed-off-by: Balázs Szücs <bszucs1209@gmail.com>
- Replaced direct field access with getter methods for profile attributes - Added Lombok annotations to PdfaProfile and PdfXProfile enums for cleaner code - Updated logging statements to utilize getter methods for consistency - Improved validation and error message construction by standardizing access to profile attributes - Simplified PdfaProfile and PdfXProfile implementations by removing redundant methods Signed-off-by: Balázs Szücs <bszucs1209@gmail.com>
…thods - Replaced getter methods with direct field access for profile attributes - Simplified output suffix and preflight format methods by using direct fields - Enhanced request token filtering logic by leveraging direct field access Signed-off-by: Balázs Szücs <bszucs1209@gmail.com>
…ove resource path handling - Extracted PDF/A preflight parsing logic into `parsePreflightDocument` for reusability - Replaced hardcoded ICC resource paths with constants for better maintainability - Simplified GregorianCalendar conversions by removing redundant `java.util` statement - Improved exception handling for preflight parsing with detailed error messages Signed-off-by: Balázs Szücs <bszucs1209@gmail.com>
…ocess - Updated font embedding settings to avoid incomplete glyphs and allow substitutions - Added high-quality prepress settings for better content preservation - Introduced Type1 font CharSet fixing to handle missing or invalid definitions - Implemented QPDF-based PDF normalization to address font program issues - Enhanced cleanup logic to manage temporary files from normalization steps Signed-off-by: Balázs Szücs <bszucs1209@gmail.com>
Description of Changes
TLDR
PdfToPdfARequestto include PDF/X in supported output formatsThis PR replaces the PDF/A conversion system with Ghostscript as the primary method, which less warning prone output compared to the previous LibreOffice approach. It also adds PDF/X format support for print production workflows.
Better PDF/A Compliance
New PDF/X Support
Print production workflows now supported with PDF/X-1, PDF/X-3, and PDF/X-4 formats for professional printing requirements.
More Reliable Output
Ghostscript Integration
buildGhostscriptCommand()/buildGhostscriptCommandX()- Constructs CLI argumentsconvertWithGhostscript()/convertWithGhostscriptX()- Executes conversionisGhostscriptAvailable()- Checks installationprepareColorProfiles()- Sets up ICC profilescreatePdfaDefFile()- Generates PostScript definitionsConversion Flow
handlePdfAConversion()- Routes PDF/A with Ghostscript primary, PDFBox fallbackhandlePdfXConversion()- Routes PDF/X using GhostscriptconvertWithPdfBoxMethod()- Refactored fallback methodValidation
validatePdfaOutput()- Validates using PDFBox PreflightvalidateAndWarnPdfA()- Logs warnings instead of failingbuildPreflightErrorMessage()- Formats detailed errorsFont Handling
Updated
embedMissingFonts()prevents stream exhaustion by loading font bytes once and creating fresh InputStreams for multiple load attempts.Utilities
findUnembeddedFontNames()- Identifies unembedded fontsdeleteQuietly()- Recursively deletes temp directoriessanitizePdfA()- Removes incompatible elementsremoveElementsForPdfA()- Removes Optional Content and transparencymergeAndAddXmpMetadata()- Handles XMP metadatapreProcessHighlights()- Pre-processes annotationsisTransparencyGroup(),hasTransparentImages(),detectTransparentXObjects()PDF/A
PDF/X
As mentioned greatest benefit is the new Ghostscript conversion is able to deliver fewer warning/zero error PDF/A files compared to the LibreOffice. Sometimes however, both succeed without warnings. Here are some samples:
There is also some size difference, (not sure why) but generally that also favors Ghostscript:
Front-end
Checklist
General
Documentation
Translations (if applicable)
scripts/counter_translation.pyUI Changes (if applicable)
Testing (if applicable)