Improving End-to-End Models for Form Understanding with Synthetic Ground Truth Pairs

Abstract

End-to-end document understanding models have shown promising results but often struggle with real-world form parsing due to limited training data diversity. We present methods for improving these models by leveraging synthetic ground truth pairs of commonly filled form data.

Key Contributions

Synthetic Data Generation Pipeline: We develop a pipeline for generating realistic synthetic forms with corresponding ground truth annotations.

Improved Form Understanding: Our approach demonstrates improved performance on real-world form parsing tasks.

Data Efficiency: Models trained with our synthetic data require less real annotated data to achieve competitive performance.

Methodology

We create synthetic ground truth pairs by: - Generating diverse form templates programmatically - Filling forms with realistic field values - Creating perfect pixel-aligned annotations - Applying realistic augmentations to bridge the domain gap

Results

Our method shows improvements on standard document understanding benchmarks while requiring significantly less manual annotation effort.

Cite This Work

@InProceedings{Fu_2026_WACV, author = {Fu, Andre and Karlsen, Egil and Mohamad, Taha}, title = {Improving End-to-End Models for Form Understanding with Synthetic Ground Truth Pairs}, booktitle = {Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision (WACV) Workshops}, month = {February}, year = {2026} }

Authors

Abstract

Key Contributions

Methodology

Results

Cite This Work