LayoutLMv3InvoiceCzech (V2 – Synthetic + Random Layout + Real Layout Injection)

This model is a fine-tuned version of microsoft/layoutlmv3-base for structured information extraction from Czech invoices.

It achieves the following results on the evaluation set:

  • Loss: 0.0763
  • Precision: 0.8009
  • Recall: 0.8849
  • F1: 0.8408
  • Accuracy: 0.9844

Model description

LayoutLMv3InvoiceCzech (V2) represents an advanced multimodal document understanding model combining:

  • textual features
  • spatial layout (bounding boxes)
  • visual features (image embeddings)

The model performs token-level classification to extract structured invoice fields:

  • supplier
  • customer
  • invoice number
  • bank details
  • totals
  • dates

This version introduces real layout injection, significantly improving realism and generalization.


Training data

The dataset consists of three components:

  1. Synthetic template-based invoices
  2. Synthetic invoices with randomized layouts
  3. Hybrid invoices with real layouts and synthetic content

Real layout injection

In the hybrid dataset:

  • real invoice layouts are used as templates
  • original text content is replaced with synthetic data
  • new content is rendered into authentic document structures

This preserves:

  • real-world spatial distributions
  • visual patterns and formatting
  • document complexity

while maintaining:

  • full annotation control
  • consistent labels

Role in the pipeline

This model corresponds to:

V2 – Synthetic + layout augmentation + real layout injection

It is used to:

  • bridge the gap between synthetic and real-world data
  • evaluate the impact of realistic layouts on multimodal models
  • compare with:
    • V0–V1 (fully synthetic)
    • V3 (real data fine-tuning)

Intended uses

  • Advanced multimodal document AI
  • Invoice information extraction with visual + spatial features
  • Evaluation of hybrid data strategies
  • Benchmarking LayoutLMv3

Limitations

  • Text content remains synthetic
  • Limited exposure to real linguistic variability
  • OCR noise and scanning artifacts are not fully represented
  • May struggle with rare real-world edge cases

Training procedure

Training hyperparameters

The following hyperparameters were used during training:

  • learning_rate: 1e-05
  • train_batch_size: 8
  • eval_batch_size: 1
  • seed: 42
  • optimizer: Use OptimizerNames.ADAMW_TORCH_FUSED with betas=(0.9,0.999) and epsilon=1e-08 and optimizer_args=No additional optimizer arguments
  • lr_scheduler_type: linear
  • lr_scheduler_warmup_steps: 0.1
  • num_epochs: 10
  • mixed_precision_training: Native AMP

Training results

Training Loss Epoch Step Validation Loss Precision Recall F1 Accuracy
No log 1.0 115 0.0725 0.7496 0.8257 0.7858 0.9807
No log 2.0 230 0.0701 0.7569 0.8376 0.7952 0.9822
No log 3.0 345 0.0735 0.7587 0.8883 0.8184 0.9810
No log 4.0 460 0.0743 0.7827 0.8714 0.8247 0.9826
0.0606 5.0 575 0.0783 0.7756 0.8714 0.8207 0.9821
0.0606 6.0 690 0.0811 0.7561 0.8968 0.8204 0.9814
0.0606 7.0 805 0.0763 0.8009 0.8849 0.8408 0.9844
0.0606 8.0 920 0.0826 0.7784 0.9036 0.8363 0.9835
0.0201 9.0 1035 0.0824 0.7837 0.8951 0.8357 0.9836
0.0201 10.0 1150 0.0852 0.7818 0.9036 0.8383 0.9834

Framework versions

  • Transformers 5.0.0
  • PyTorch 2.10.0+cu128
  • Datasets 4.0.0
  • Tokenizers 0.22.2
Downloads last month
198
Safetensors
Model size
0.1B params
Tensor type
F32
·
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for TomasFAV/Layoutlmv3InvoiceCzechV012

Finetuned
(297)
this model
Finetunes
1 model