Abstract
End-to-end document understanding models have shown promising results but often struggle with real-world form parsing due to limited training data diversity. We present methods for improving these models by leveraging synthetic ground truth pairs of commonly filled form data.
Key Contributions
- Synthetic Data Generation Pipeline: We develop a pipeline for generating realistic synthetic forms with corresponding ground truth annotations.
- Improved Form Understanding: Our approach demonstrates improved performance on real-world form parsing tasks.
- Data Efficiency: Models trained with our synthetic data require less real annotated data to achieve competitive performance.
Methodology
We create synthetic ground truth pairs by: - Generating diverse form templates programmatically - Filling forms with realistic field values - Creating perfect pixel-aligned annotations - Applying realistic augmentations to bridge the domain gap
Results
Our method shows improvements on standard document understanding benchmarks while requiring significantly less manual annotation effort.