LayoutLMv3InvoiceCzech (V1 – Synthetic + Random Layout)

This model is a fine-tuned version of microsoft/layoutlmv3-base for structured information extraction from Czech invoices.

It achieves the following results on the evaluation set:

Loss: 0.1750
Precision: 0.6800
Recall: 0.6904
F1: 0.6851
Accuracy: 0.9714

Model description

LayoutLMv3InvoiceCzech (V1) extends the baseline multimodal model by introducing layout variability into the training data.

The model leverages:

textual features
spatial layout (bounding boxes)
visual features (image embeddings)

It performs token-level classification to extract structured invoice fields:

supplier
customer
invoice number
bank details
totals
dates

Compared to V0, this version is trained on synthetically generated invoices with randomized layouts, improving robustness to structural variations.

Training data

The dataset consists of:

synthetically generated invoices based on templates
augmented variants with randomized layouts
corresponding bounding boxes
rendered document images

Key properties:

variable positioning of fields
layout perturbations (shifts, spacing, ordering)
preserved label consistency
fully synthetic data

This dataset introduces layout diversity and tests how multimodal models respond to structural variability.

Role in the pipeline

This model corresponds to:

V1 – Synthetic templates + randomized layouts

It is used to:

evaluate the impact of layout variability on multimodal models
compare against:
- V0 (fixed layouts)
- later hybrid and real-data stages (V2, V3)
analyze interaction between visual and spatial features

Intended uses

Research in multimodal document understanding
Benchmarking LayoutLMv3 under layout variability
Comparison with BERT and LiLT
Czech invoice information extraction

Limitations

Still trained only on synthetic data
Layout variability is artificial
Visual features are derived from clean renderings
No real-world noise (OCR errors, scanning artifacts)

Training procedure

Training hyperparameters

The following hyperparameters were used during training:

learning_rate: 1e-05
train_batch_size: 8
eval_batch_size: 1
seed: 42
optimizer: Use OptimizerNames.ADAMW_TORCH_FUSED with betas=(0.9,0.999) and epsilon=1e-08 and optimizer_args=No additional optimizer arguments
lr_scheduler_type: linear
lr_scheduler_warmup_steps: 0.1
num_epochs: 10
mixed_precision_training: Native AMP

Training results

Training Loss	Epoch	Step	Validation Loss	Precision	Recall	F1	Accuracy
No log	1.0	75	0.1545	0.6769	0.6701	0.6735	0.9711
No log	2.0	150	0.1658	0.6732	0.6937	0.6833	0.9695
No log	3.0	225	0.1750	0.6800	0.6904	0.6851	0.9714
No log	4.0	300	0.1946	0.6881	0.6159	0.6500	0.9707
No log	5.0	375	0.1896	0.6941	0.6717	0.6827	0.9717
No log	6.0	450	0.1979	0.6609	0.6430	0.6518	0.9704
0.0193	7.0	525	0.1991	0.6702	0.6396	0.6545	0.9706
0.0193	8.0	600	0.2014	0.6503	0.6261	0.6379	0.9698
0.0193	9.0	675	0.1955	0.6523	0.6413	0.6468	0.9702
0.0193	10.0	750	0.1956	0.6535	0.6447	0.6491	0.9704

Framework versions

Transformers 5.0.0
PyTorch 2.10.0+cu128
Datasets 4.0.0
Tokenizers 0.22.2

Downloads last month: 6

Safetensors

Model size

0.1B params

Tensor type

F32

Model tree for TomasFAV/Layoutlmv3InvoiceCzechV01

Base model

microsoft/layoutlmv3-base

Finetuned

(307)

this model

Finetunes

1 model