LayoutLMv3InvoiceCzech (V2 – Synthetic + Random Layout + Real Layout Injection)
This model is a fine-tuned version of microsoft/layoutlmv3-base for structured information extraction from Czech invoices.
It achieves the following results on the evaluation set:
- Loss: 0.0763
- Precision: 0.8009
- Recall: 0.8849
- F1: 0.8408
- Accuracy: 0.9844
Model description
LayoutLMv3InvoiceCzech (V2) represents an advanced multimodal document understanding model combining:
- textual features
- spatial layout (bounding boxes)
- visual features (image embeddings)
The model performs token-level classification to extract structured invoice fields:
- supplier
- customer
- invoice number
- bank details
- totals
- dates
This version introduces real layout injection, significantly improving realism and generalization.
Training data
The dataset consists of three components:
- Synthetic template-based invoices
- Synthetic invoices with randomized layouts
- Hybrid invoices with real layouts and synthetic content
Real layout injection
In the hybrid dataset:
- real invoice layouts are used as templates
- original text content is replaced with synthetic data
- new content is rendered into authentic document structures
This preserves:
- real-world spatial distributions
- visual patterns and formatting
- document complexity
while maintaining:
- full annotation control
- consistent labels
Role in the pipeline
This model corresponds to:
V2 – Synthetic + layout augmentation + real layout injection
It is used to:
- bridge the gap between synthetic and real-world data
- evaluate the impact of realistic layouts on multimodal models
- compare with:
- V0–V1 (fully synthetic)
- V3 (real data fine-tuning)
Intended uses
- Advanced multimodal document AI
- Invoice information extraction with visual + spatial features
- Evaluation of hybrid data strategies
- Benchmarking LayoutLMv3
Limitations
- Text content remains synthetic
- Limited exposure to real linguistic variability
- OCR noise and scanning artifacts are not fully represented
- May struggle with rare real-world edge cases
Training procedure
Training hyperparameters
The following hyperparameters were used during training:
- learning_rate: 1e-05
- train_batch_size: 8
- eval_batch_size: 1
- seed: 42
- optimizer: Use OptimizerNames.ADAMW_TORCH_FUSED with betas=(0.9,0.999) and epsilon=1e-08 and optimizer_args=No additional optimizer arguments
- lr_scheduler_type: linear
- lr_scheduler_warmup_steps: 0.1
- num_epochs: 10
- mixed_precision_training: Native AMP
Training results
| Training Loss | Epoch | Step | Validation Loss | Precision | Recall | F1 | Accuracy |
|---|---|---|---|---|---|---|---|
| No log | 1.0 | 115 | 0.0725 | 0.7496 | 0.8257 | 0.7858 | 0.9807 |
| No log | 2.0 | 230 | 0.0701 | 0.7569 | 0.8376 | 0.7952 | 0.9822 |
| No log | 3.0 | 345 | 0.0735 | 0.7587 | 0.8883 | 0.8184 | 0.9810 |
| No log | 4.0 | 460 | 0.0743 | 0.7827 | 0.8714 | 0.8247 | 0.9826 |
| 0.0606 | 5.0 | 575 | 0.0783 | 0.7756 | 0.8714 | 0.8207 | 0.9821 |
| 0.0606 | 6.0 | 690 | 0.0811 | 0.7561 | 0.8968 | 0.8204 | 0.9814 |
| 0.0606 | 7.0 | 805 | 0.0763 | 0.8009 | 0.8849 | 0.8408 | 0.9844 |
| 0.0606 | 8.0 | 920 | 0.0826 | 0.7784 | 0.9036 | 0.8363 | 0.9835 |
| 0.0201 | 9.0 | 1035 | 0.0824 | 0.7837 | 0.8951 | 0.8357 | 0.9836 |
| 0.0201 | 10.0 | 1150 | 0.0852 | 0.7818 | 0.9036 | 0.8383 | 0.9834 |
Framework versions
- Transformers 5.0.0
- PyTorch 2.10.0+cu128
- Datasets 4.0.0
- Tokenizers 0.22.2
- Downloads last month
- 198