Skip to content

Conversation

@balazs-szucs
Copy link
Collaborator

@balazs-szucs balazs-szucs commented Nov 7, 2025

Description of Changes

TLDR

  • Updated PdfToPdfARequest to include PDF/X in supported output formats
  • Expanded input handling and model validation for PDF/A and PDF/X
  • Added Ghostscript as a preferred backend for PDF/A and PDF/X conversions
  • Implemented PDF/X-specific conversion logic with detailed validation
  • Updated UI templates to separate PDF/A and PDF/X format options
  • Updated error handling and warnings during conversion processes

This PR replaces the PDF/A conversion system with Ghostscript as the primary method, which less warning prone output compared to the previous LibreOffice approach. It also adds PDF/X format support for print production workflows.

Better PDF/A Compliance

  • Ghostscript produces standards-compliant PDF/A with fewer validation errors
  • Previous LibreOffice method generates files with structural errors and validation warnings
  • Automatic fallback to PDFBox/LibreOffice if Ghostscript unavailable
  • Built-in validation using PDFBox Preflight catches issues early

New PDF/X Support

Print production workflows now supported with PDF/X-1, PDF/X-3, and PDF/X-4 formats for professional printing requirements.

More Reliable Output

  • Deterministic conversion results
  • Better font embedding and subsetting
  • Proper ICC profile and color space handling
  • Improved resource cleanup prevents memory leaks

Ghostscript Integration

  • buildGhostscriptCommand() / buildGhostscriptCommandX() - Constructs CLI arguments
  • convertWithGhostscript() / convertWithGhostscriptX() - Executes conversion
  • isGhostscriptAvailable() - Checks installation
  • prepareColorProfiles() - Sets up ICC profiles
  • createPdfaDefFile() - Generates PostScript definitions

Conversion Flow

  • handlePdfAConversion() - Routes PDF/A with Ghostscript primary, PDFBox fallback
  • handlePdfXConversion() - Routes PDF/X using Ghostscript
  • convertWithPdfBoxMethod() - Refactored fallback method

Validation

  • validatePdfaOutput() - Validates using PDFBox Preflight
  • validateAndWarnPdfA() - Logs warnings instead of failing
  • buildPreflightErrorMessage() - Formats detailed errors

Font Handling

Updated embedMissingFonts() prevents stream exhaustion by loading font bytes once and creating fresh InputStreams for multiple load attempts.

Utilities

  • findUnembeddedFontNames() - Identifies unembedded fonts
  • deleteQuietly() - Recursively deletes temp directories
  • sanitizePdfA() - Removes incompatible elements
  • removeElementsForPdfA() - Removes Optional Content and transparency
  • mergeAndAddXmpMetadata() - Handles XMP metadata
  • preProcessHighlights() - Pre-processes annotations
  • Transparency detection: isTransparencyGroup(), hasTransparentImages(), detectTransparentXObjects()

PDF/A

  • PDF/A-1b: Strict compliance
  • PDF/A-2b: Extended features (default)
  • PDF/A-3b: Embedded files support

PDF/X

  • PDF/X-1: Standard print exchange
  • PDF/X-3: Color-managed with ICC profiles
  • PDF/X-4: Transparency support

As mentioned greatest benefit is the new Ghostscript conversion is able to deliver fewer warning/zero error PDF/A files compared to the LibreOffice. Sometimes however, both succeed without warnings. Here are some samples:

image image image image

There is also some size difference, (not sure why) but generally that also favors Ghostscript:

image

Front-end

image

Checklist

General

Documentation

Translations (if applicable)

UI Changes (if applicable)

  • Screenshots or videos demonstrating the UI changes are attached (e.g., as comments or direct attachments in the PR)

Testing (if applicable)

  • I have tested my changes locally. Refer to the Testing Guide for more details.

… current PDF/A conversion

- Updated `PdfToPdfARequest` to include PDF/X in supported output formats
- Expanded input handling and model validation for PDF/A and PDF/X
- Added Ghostscript as a preferred backend for PDF/A and PDF/X conversions
- Implemented PDF/X-specific conversion logic with detailed validation
- Updated UI templates to separate PDF/A and PDF/X format options
- Enhanced error handling and warnings during conversion processes
- Revised localized strings to reflect expanded functionality

Signed-off-by: Balázs Szücs <bszucs1209@gmail.com>
@stirlingbot
Copy link
Contributor

stirlingbot bot commented Nov 7, 2025

🚀 Translation Verification Summary

🔄 Reference Branch: pr-branch-messages_en_GB.properties

📃 File Check: messages_en_GB.properties

  1. Test Status:Passed
  2. Test Status:Passed
  3. Test Status:Passed

✅ Overall Check Status: Success

Thanks @balazs-szucs for your help in keeping the translations up to date.

@stirlingbot stirlingbot bot added enhancement New feature or request Java Pull requests that update Java code Front End Issues or pull requests related to front-end development Back End Issues related to back-end development Translation API API-related issues or pull requests labels Nov 7, 2025
- Resolved inconsistencies in messages_en_US and messages_en_GB property files
- Updated descriptions, tags, and credits for PDF/A and PDF/X conversion
- Adjusted localization strings to ensure proper format support handling

Signed-off-by: Balázs Szücs <bszucs1209@gmail.com>
- Replaced redundant streams and lists initialization with more efficient alternatives
- Centralized stream reading logic to prevent reuse issues and ensure proper closing
- Enhanced logging for PDF/A validation to differentiate warnings from errors
- Simplified methods by removing redundant parameters and improving clarity
- Updated GregorianCalendar usage to modern java.time classes
- Ensured static state for utility-like methods for cleaner invocation
- Improved PDF/A metadata handling by aligning structure and removing redundancy

Signed-off-by: Balázs Szücs <bszucs1209@gmail.com>
@balazs-szucs balazs-szucs marked this pull request as ready for review November 8, 2025 10:53
Copilot AI review requested due to automatic review settings November 8, 2025 10:53
@dosubot dosubot bot added the size:XXL This PR changes 1000+ lines ignoring generated files. label Nov 8, 2025
Copy link
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull Request Overview

This PR enhances the PDF conversion functionality by adding support for PDF/X formats alongside existing PDF/A support, and introduces Ghostscript as the preferred conversion method with PDFBox/LibreOffice as a fallback.

  • Added PDF/X-1, PDF/X-3, and PDF/X-4 conversion options for print production workflows
  • Implemented Ghostscript-based conversion with automatic fallback to PDFBox/LibreOffice
  • Refactored code to use enums (PdfaProfile, PdfXProfile) for better type safety and maintainability

Reviewed Changes

Copilot reviewed 4 out of 4 changed files in this pull request and generated 9 comments.

File Description
pdf-to-pdfa.html Updated UI to display PDF/A and PDF/X format options in grouped dropdowns with default selection
messages_en_GB.properties Updated strings to reflect PDF/A & PDF/X support and revised conversion method descriptions
PdfToPdfARequest.java Extended allowable format values to include new PDF/A variants and PDF/X formats
ConvertPDFToPDFA.java Major refactoring: added Ghostscript conversion methods, profile enums, validation logic, and split conversion handling for PDF/A vs PDF/X

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

… validation

- Added null checks for PDResources to prevent possible null pointer exceptions
- Streamlined request token processing by normalizing input directly in initialization
- Simplified filter logic for profile matching by leveraging pre-normalized tokens
- Removed redundant option from the PDF/A format dropdown in the UI template

Signed-off-by: Balázs Szücs <bszucs1209@gmail.com>
- Replaced direct field access with getter methods for profile attributes
- Added Lombok annotations to PdfaProfile and PdfXProfile enums for cleaner code
- Updated logging statements to utilize getter methods for consistency
- Improved validation and error message construction by standardizing access to profile attributes
- Simplified PdfaProfile and PdfXProfile implementations by removing redundant methods

Signed-off-by: Balázs Szücs <bszucs1209@gmail.com>
…thods

- Replaced getter methods with direct field access for profile attributes
- Simplified output suffix and preflight format methods by using direct fields
- Enhanced request token filtering logic by leveraging direct field access

Signed-off-by: Balázs Szücs <bszucs1209@gmail.com>
…ove resource path handling

- Extracted PDF/A preflight parsing logic into `parsePreflightDocument` for reusability
- Replaced hardcoded ICC resource paths with constants for better maintainability
- Simplified GregorianCalendar conversions by removing redundant `java.util` statement
- Improved exception handling for preflight parsing with detailed error messages

Signed-off-by: Balázs Szücs <bszucs1209@gmail.com>
…ocess

- Updated font embedding settings to avoid incomplete glyphs and allow substitutions
- Added high-quality prepress settings for better content preservation
- Introduced Type1 font CharSet fixing to handle missing or invalid definitions
- Implemented QPDF-based PDF normalization to address font program issues
- Enhanced cleanup logic to manage temporary files from normalization steps

Signed-off-by: Balázs Szücs <bszucs1209@gmail.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

API API-related issues or pull requests Back End Issues related to back-end development enhancement New feature or request Front End Issues or pull requests related to front-end development Java Pull requests that update Java code size:XXL This PR changes 1000+ lines ignoring generated files. Translation

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant