LayoutLMv3InvoiceCzech (V0 – Synthetic Templates Only)
This model is a fine-tuned version of microsoft/layoutlmv3-base for structured information extraction from Czech invoices.
It achieves the following results on the evaluation set:
- Loss: 0.2146
- Precision: 0.5354
- Recall: 0.7428
- F1: 0.6223
- Accuracy: 0.9583
Model description
LayoutLMv3InvoiceCzech (V0) is a multimodal document understanding model that leverages:
- textual information
- spatial layout (bounding boxes)
- visual features (image embeddings)
The model performs token-level classification to extract structured invoice fields:
- supplier
- customer
- invoice number
- bank details
- totals
- dates
This version is trained exclusively on synthetically generated invoice templates.
Training data
The dataset consists of:
- synthetically generated invoices
- fixed template layouts
- corresponding bounding boxes
- rendered document images
Key properties:
- consistent structure across samples
- clean and noise-free data
- perfect alignment between text, layout, and image
- no real-world documents
This represents the baseline dataset for multimodal document models.
Role in the pipeline
This model corresponds to:
V0 – Synthetic template-based dataset only
It is used to:
- establish a baseline for multimodal models
- compare against:
- text-only models (BERT)
- layout-aware models without vision (LiLT)
- evaluate the contribution of visual features in a controlled setting
Intended uses
- Research in multimodal document understanding
- Benchmarking LayoutLMv3 on structured documents
- Comparison with other architectures (BERT, LiLT, etc.)
- Czech invoice information extraction
Limitations
- Trained only on synthetic data with fixed layouts
- Limited generalization to real-world invoices
- Visual features are learned from clean synthetic renderings
- No exposure to:
- OCR errors
- scanning artifacts
- real-world noise
Training procedure
Training hyperparameters
The following hyperparameters were used during training:
- learning_rate: 1e-05
- train_batch_size: 8
- eval_batch_size: 1
- seed: 42
- optimizer: Use OptimizerNames.ADAMW_TORCH_FUSED with betas=(0.9,0.999) and epsilon=1e-08 and optimizer_args=No additional optimizer arguments
- lr_scheduler_type: linear
- lr_scheduler_warmup_steps: 0.1
- num_epochs: 10
- mixed_precision_training: Native AMP
Training results
| Training Loss | Epoch | Step | Validation Loss | Precision | Recall | F1 | Accuracy |
|---|---|---|---|---|---|---|---|
| No log | 1.0 | 150 | 0.2817 | 0.1429 | 0.0829 | 0.1049 | 0.9470 |
| No log | 2.0 | 300 | 0.2222 | 0.3480 | 0.4822 | 0.4043 | 0.9480 |
| No log | 3.0 | 450 | 0.2170 | 0.3852 | 0.5736 | 0.4609 | 0.9480 |
| 0.5287 | 4.0 | 600 | 0.1919 | 0.4625 | 0.6261 | 0.5320 | 0.9558 |
| 0.5287 | 5.0 | 750 | 0.1701 | 0.5254 | 0.7174 | 0.6066 | 0.9627 |
| 0.5287 | 6.0 | 900 | 0.2060 | 0.5173 | 0.7327 | 0.6064 | 0.9565 |
| 0.0360 | 7.0 | 1050 | 0.2161 | 0.5370 | 0.7124 | 0.6124 | 0.9594 |
| 0.0360 | 8.0 | 1200 | 0.2146 | 0.5359 | 0.7445 | 0.6232 | 0.9584 |
| 0.0360 | 9.0 | 1350 | 0.2141 | 0.5268 | 0.7327 | 0.6129 | 0.9578 |
| 0.0147 | 10.0 | 1500 | 0.2131 | 0.5393 | 0.7310 | 0.6207 | 0.9597 |
Framework versions
- Transformers 5.0.0
- PyTorch 2.10.0+cu128
- Datasets 4.0.0
- Tokenizers 0.22.2
- Downloads last month
- 188