← All Publications
WACV VisionDocsFeb 2026

Improving End-to-End Models for Form Understanding with Synthetic Ground Truth Pairs

We present methods for improving end-to-end document understanding models by leveraging synthetic ground truth pairs of commonly filled form data.

Published at WACV VisionDocs Workshop

Authors

Andre Fu, Egil Karlsen, Taha Mohamad

Abstract

End-to-end document understanding models have shown promising results but often struggle with real-world form parsing due to limited training data diversity. We present methods for improving these models by leveraging synthetic ground truth pairs of commonly filled form data.

Key Contributions

  1. Synthetic Data Generation Pipeline: We develop a pipeline for generating realistic synthetic forms with corresponding ground truth annotations.
  1. Improved Form Understanding: Our approach demonstrates improved performance on real-world form parsing tasks.
  1. Data Efficiency: Models trained with our synthetic data require less real annotated data to achieve competitive performance.

Methodology

We create synthetic ground truth pairs by: - Generating diverse form templates programmatically - Filling forms with realistic field values - Creating perfect pixel-aligned annotations - Applying realistic augmentations to bridge the domain gap

Results

Our method shows improvements on standard document understanding benchmarks while requiring significantly less manual annotation effort.

Cite This Work

@InProceedings{Fu_2026_WACV,
    author    = {Fu, Andre and Karlsen, Egil and Mohamad, Taha},
    title     = {Improving End-to-End Models for Form Understanding with Synthetic Ground Truth Pairs},
    booktitle = {Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision (WACV) Workshops},
    month     = {February},
    year      = {2026}
}