LayoutLMv3InvoiceCzech (V2 – Synthetic + Random Layout + Real Layout Injection)

This model is a fine-tuned version of microsoft/layoutlmv3-base for structured information extraction from Czech invoices.

It achieves the following results on the evaluation set:

Loss: 0.0763
Precision: 0.8009
Recall: 0.8849
F1: 0.8408
Accuracy: 0.9844

Model description

LayoutLMv3InvoiceCzech (V2) represents an advanced multimodal document understanding model combining:

textual features
spatial layout (bounding boxes)
visual features (image embeddings)

The model performs token-level classification to extract structured invoice fields:

supplier
customer
invoice number
bank details
totals
dates

This version introduces real layout injection, significantly improving realism and generalization.

Training data

The dataset consists of three components:

Synthetic template-based invoices
Synthetic invoices with randomized layouts
Hybrid invoices with real layouts and synthetic content

Real layout injection

In the hybrid dataset:

real invoice layouts are used as templates
original text content is replaced with synthetic data
new content is rendered into authentic document structures

This preserves:

real-world spatial distributions
visual patterns and formatting
document complexity

while maintaining:

full annotation control
consistent labels

Role in the pipeline

This model corresponds to:

V2 – Synthetic + layout augmentation + real layout injection

It is used to:

bridge the gap between synthetic and real-world data
evaluate the impact of realistic layouts on multimodal models
compare with:
- V0–V1 (fully synthetic)
- V3 (real data fine-tuning)

Intended uses

Advanced multimodal document AI
Invoice information extraction with visual + spatial features
Evaluation of hybrid data strategies
Benchmarking LayoutLMv3

Limitations

Text content remains synthetic
Limited exposure to real linguistic variability
OCR noise and scanning artifacts are not fully represented
May struggle with rare real-world edge cases

Training procedure

Training hyperparameters

The following hyperparameters were used during training:

learning_rate: 1e-05
train_batch_size: 8
eval_batch_size: 1
seed: 42
optimizer: Use OptimizerNames.ADAMW_TORCH_FUSED with betas=(0.9,0.999) and epsilon=1e-08 and optimizer_args=No additional optimizer arguments
lr_scheduler_type: linear
lr_scheduler_warmup_steps: 0.1
num_epochs: 10
mixed_precision_training: Native AMP

Training results

Training Loss	Epoch	Step	Validation Loss	Precision	Recall	F1	Accuracy
No log	1.0	115	0.0725	0.7496	0.8257	0.7858	0.9807
No log	2.0	230	0.0701	0.7569	0.8376	0.7952	0.9822
No log	3.0	345	0.0735	0.7587	0.8883	0.8184	0.9810
No log	4.0	460	0.0743	0.7827	0.8714	0.8247	0.9826
0.0606	5.0	575	0.0783	0.7756	0.8714	0.8207	0.9821
0.0606	6.0	690	0.0811	0.7561	0.8968	0.8204	0.9814
0.0606	7.0	805	0.0763	0.8009	0.8849	0.8408	0.9844
0.0606	8.0	920	0.0826	0.7784	0.9036	0.8363	0.9835
0.0201	9.0	1035	0.0824	0.7837	0.8951	0.8357	0.9836
0.0201	10.0	1150	0.0852	0.7818	0.9036	0.8383	0.9834

Framework versions

Transformers 5.0.0
PyTorch 2.10.0+cu128
Datasets 4.0.0
Tokenizers 0.22.2

Downloads last month: 5

Safetensors

Model size

0.1B params

Tensor type

F32

Model tree for TomasFAV/Layoutlmv3InvoiceCzechV012

Base model

microsoft/layoutlmv3-base

Finetuned

(307)

this model

Finetunes

1 model