Skip to content

Conversation

@balazs-szucs
Copy link
Collaborator

@balazs-szucs balazs-szucs commented Nov 5, 2025

Description of Changes

TLDR

  • Adds /api/v1/form/fields, /fill, /modify-fields, and /delete-fields endpoints for end-to-end AcroForm workflows.
  • Centralizes form field detection, filling, modification, and deletion logic in FormUtils with strict type handling.
  • Introduces FormPayloadParser for resilient JSON parsing across legacy flat payloads and new structured payloads.
  • Reuses and extends FormCopyUtils plus FormFieldTypeSupport to create, clone, and normalize widget properties when transforming forms.

Implementation Details

  • FormFillController updates the new multipart APIs, and streams updated documents or metadata responses.
  • FormUtils now owns extraction, template building, value application (including flattening strategies), and field CRUD helpers used by the controller endpoints.
  • FormPayloadParser normalizes request bodies: accepts flat key/value maps, combined fields arrays, or nested templates, returning deterministic LinkedHashMap ordering for repeatable fills.
  • FormFieldTypeSupport encapsulates per-type creation, value copying, default appearance, and option handling; utilized by both modification flows and FormCopyUtils transformations.
  • FormCopyUtils exposes shared routines for making widgets across documents

API Surface (Multipart Form Data)

  • POST /api/v1/form/fields -> returns FormUtils.FormFieldExtraction with ordered FormFieldInfo records plus a fill template.
  • POST /api/v1/form/fill -> applies parsed values via FormUtils.applyFieldValues; optional flatten renders appearances while respecting strict validation.
  • POST /api/v1/form/modify-fields -> updates existing fields in-place using FormUtils.modifyFormFields with definitions parsed from updates payloads.
  • POST /api/v1/form/delete-fields -> removes named fields after FormPayloadParser.parseNameList deduplication and validation.
image

Individual endpoints:

image image image image

Data Validation & Type Safety

  • Field type inference (detectFieldType) and choice option resolution ensure only supported values are written; checkbox mapping uses export states and boolean heuristics.
  • Choice inputs pass through filterChoiceSelections / filterSingleChoiceSelection to reject invalid entries and provide actionable logs.
  • Text fills leverage setTextValue to merge inline formatting resources and regenerate appearances when necessary.
  • applyFieldValues supports strict mode (default) to raise when unknown fields are supplied, preventing silent data loss.

Automation Workflow Support

The /fill and /fields endpoints are designed to work together for automated form processing. The workflow is straightforward: extract the form structure, modify the values, and submit for filling.

How It Works:

  1. The /fields endpoint extracts all form field metadata from your PDF
  2. You modify the returned JSON to set the desired values for each field
  3. The /fill endpoint accepts this same JSON structure to populate the form

Example Workflow:

# Step 1: Extract form structure and save to fields.json
curl -o fields.json \
     -F file=@Form.pdf \
     http://localhost:8080/api/v1/form/fields

# Step 2: Edit fields.json to update the "value" property for each field
# (Use your preferred text editor or script to modify the values)

# Step 3: Fill the form using the modified JSON
curl -o filled-form.pdf \
     -F file=@Form.pdf \
     -F data=@fields.json \
     http://localhost:8080/api/v1/form/fill

How to Fill the template JSON

The template (your data) is filled by creating key-value pairs that match the "rules" defined in the fields array (the schema).

  1. Find the Field name: Look in the fields array for the name of the field you want to fill.

    • Example: {"name": "Agent of Dependent", "type": "text", ...}
  2. Use name as the Key: This name becomes the key (in quotes) in your template object.

    • Example: {"Agent of Dependent": ...}
  3. Find the type: Look at the type for that same field. This tells you what kind of value to provide.

    • "type": "text" requires a string (e.g., "John Smith").
    • "type": "checkbox" requires a boolean (e.g., true or false).
    • "type": "combobox" requires a string that exactly matches one of its "options" (e.g., "Choice 1").
  4. Add the Value: This matching value becomes the value for your key.

Correct Examples

  • For a Textbox:

    • Schema: {"name": "Agent of Dependent", "type": "text", ...}
    • Template: {"Agent of Dependent": "Mary Jane"}
  • For a Checkbox:

    • Schema: {"name": "Option 2", "type": "checkbox", ...}
    • Template: {"Option 2": true}
  • For a Dropdown (Combobox):

    • Schema: {"name": "Dropdown2", "type": "combobox", "options": ["Choice 1", "Choice 2", ...] ...}
    • Template: {"Dropdown2": "Choice 1"}

Incorrect Examples (These Will Error)

  • Wrong Type: {"Option 2": "Checked"}
    • Error: "Option 2" is a checkbox and expects true or false, not a string.
  • Wrong Option: {"Dropdown2": "Choice 99"}
    • Error: "Choice 99" is not listed in the options for "Dropdown2".

For people manually doing this

For users filling forms manually, there's a simplified format that focuses only on field names and values:

{
  "FullName": "",
  "ID": "",
  "Gender": "Off",
  "Married": false,
  "City": "[]"
}

This format is easier to work with when you're manually editing the JSON. You can skip the full metadata structure (type, label, required, etc.) and just provide the field names with their values.

Important caveat: Even though the type information isn't visible in this simplified format, type validation is still enforced by PDF viewers. This simplified format just makes manual editing more convenient while maintaining data integrity.

Please note: this suffers from: https://issues.apache.org/jira/browse/PDFBOX-5962

Closes #237
Closes #3569


Checklist

General

Documentation

Translations (if applicable)

UI Changes (if applicable)

  • Screenshots or videos demonstrating the UI changes are attached (e.g., as comments or direct attachments in the PR)

Testing (if applicable)

  • I have tested my changes locally. Refer to the Testing Guide for more details.

- Relocated `FormUtils` from `common` to `proprietary` module
- Renamed class to `FormCopyUtils`
- Updated class annotations and import paths in line with its new scope

Signed-off-by: Balázs Szücs <bszucs1209@gmail.com>
@stirlingbot stirlingbot bot added enhancement New feature or request Java Pull requests that update Java code Back End Issues related to back-end development v2 Issues or pull requests related to the v2 branch API API-related issues or pull requests Test Testing-related issues or pull requests labels Nov 5, 2025
@balazs-szucs balazs-szucs changed the title feat(delete-form,modify-form,fill-form,extract-forms): add delete, modify, fill, and extract form functionality [V2] feat(delete-form,modify-form,fill-form,extract-forms): add delete, modify, fill, and extract form functionality Nov 5, 2025
@stirlingbot stirlingbot bot removed the enhancement New feature or request label Nov 6, 2025
balazs-szucs and others added 6 commits November 6, 2025 15:39
…yUtils

- Removed reflection-based invocation of `refreshAppearances`
- Replaced it with direct method call for better clarity and maintainability
- Adjusted exception handling to catch `NoSuchMethodError` instead of `NoSuchMethodException`

Signed-off-by: Balázs Szücs <bszucs1209@gmail.com>
- Updated method names from `supportsDefinitionCreation` to `doesNotsupportsDefinitionCreation` to better reflect functionality
- Simplified multi-case switch statements in `FormUtils` for readability
- Removed unused `flattenFormOnly` method to reduce code clutter
- Adjusted `requirePdf` method signature by removing redundant parameter
- Modified `ensureCheckBoxAppearance` to be static for consistent behavior across instances

Signed-off-by: Balázs Szücs <bszucs1209@gmail.com>
@stirlingbot stirlingbot bot added the Security Security-related issues or pull requests label Nov 6, 2025
@balazs-szucs balazs-szucs marked this pull request as ready for review November 6, 2025 19:23
Copilot AI review requested due to automatic review settings November 6, 2025 19:23
@dosubot dosubot bot added size:XXL This PR changes 1000+ lines ignoring generated files. enhancement New feature or request labels Nov 6, 2025
Copy link
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull Request Overview

This PR performs a major refactoring of form-related utilities, moving FormUtils from the common module to the proprietary module and significantly expanding its functionality. The changes introduce comprehensive PDF form field manipulation capabilities including extraction, modification, deletion, and filling operations.

Key Changes

  • Moved and expanded FormUtils: Relocated from app/common to app/proprietary with 1,762 lines of new functionality for form field operations
  • New API controllers: Added FormFillController and FormPayloadParser providing REST endpoints for form operations
  • New type system: Introduced FormFieldTypeSupport enum for type-safe field handling across different PDF form field types
  • Removed form copying from MultiPageLayoutController: Form field transformation functionality removed from the core module (proprietary-only feature)

Reviewed Changes

Copilot reviewed 11 out of 11 changed files in this pull request and generated 8 comments.

Show a summary per file
File Description
FormUtils.java (proprietary) Complete rewrite with extensive form field extraction, modification, deletion, and filling capabilities
FormFieldTypeSupport.java New enum-based type handler for different PDF form field types (text, checkbox, radio, combobox, listbox, signature, button)
FormCopyUtils.java New utility for copying and transforming form fields across multi-page layouts
FormFillController.java New REST controller exposing form operations via /api/v1/form/* endpoints
FormPayloadParser.java New parser handling flexible JSON payload formats for form operations
FormUtilsTest.java New test file (disabled, deferred to integration tests)
MultiPageLayoutController.java Removed form field copying logic (moved to proprietary module)
FormUtils.java (common) Deleted - functionality moved to proprietary module
RegexPatternUtils.java Added patterns for form field name parsing and validation
EndpointConfiguration.java Registered new form endpoints in "Other" group
UserAuthenticationFilter.java Minor comment formatting fix

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

- Added null checks for `alternate`, `tooltipLabel`, and options in `FormUtils`
- Improved boundary checks in `FormCopyUtils` to prevent out-of-bounds errors
- Simplified widget mapping process in `FormCopyUtils` with early exits

Signed-off-by: Balázs Szücs <bszucs1209@gmail.com>
@stirlingbot stirlingbot bot removed the enhancement New feature or request label Nov 6, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

API API-related issues or pull requests Back End Issues related to back-end development Java Pull requests that update Java code Security Security-related issues or pull requests size:XXL This PR changes 1000+ lines ignoring generated files. Test Testing-related issues or pull requests v2 Issues or pull requests related to the v2 branch

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants