LayoutLMv3InvoiceCzech (V0 – Synthetic Templates Only)

This model is a fine-tuned version of microsoft/layoutlmv3-base for structured information extraction from Czech invoices.

It achieves the following results on the evaluation set:

  • Loss: 0.2146
  • Precision: 0.5354
  • Recall: 0.7428
  • F1: 0.6223
  • Accuracy: 0.9583

Model description

LayoutLMv3InvoiceCzech (V0) is a multimodal document understanding model that leverages:

  • textual information
  • spatial layout (bounding boxes)
  • visual features (image embeddings)

The model performs token-level classification to extract structured invoice fields:

  • supplier
  • customer
  • invoice number
  • bank details
  • totals
  • dates

This version is trained exclusively on synthetically generated invoice templates.


Training data

The dataset consists of:

  • synthetically generated invoices
  • fixed template layouts
  • corresponding bounding boxes
  • rendered document images

Key properties:

  • consistent structure across samples
  • clean and noise-free data
  • perfect alignment between text, layout, and image
  • no real-world documents

This represents the baseline dataset for multimodal document models.


Role in the pipeline

This model corresponds to:

V0 – Synthetic template-based dataset only

It is used to:

  • establish a baseline for multimodal models
  • compare against:
    • text-only models (BERT)
    • layout-aware models without vision (LiLT)
  • evaluate the contribution of visual features in a controlled setting

Intended uses

  • Research in multimodal document understanding
  • Benchmarking LayoutLMv3 on structured documents
  • Comparison with other architectures (BERT, LiLT, etc.)
  • Czech invoice information extraction

Limitations

  • Trained only on synthetic data with fixed layouts
  • Limited generalization to real-world invoices
  • Visual features are learned from clean synthetic renderings
  • No exposure to:
    • OCR errors
    • scanning artifacts
    • real-world noise

Training procedure

Training hyperparameters

The following hyperparameters were used during training:

  • learning_rate: 1e-05
  • train_batch_size: 8
  • eval_batch_size: 1
  • seed: 42
  • optimizer: Use OptimizerNames.ADAMW_TORCH_FUSED with betas=(0.9,0.999) and epsilon=1e-08 and optimizer_args=No additional optimizer arguments
  • lr_scheduler_type: linear
  • lr_scheduler_warmup_steps: 0.1
  • num_epochs: 10
  • mixed_precision_training: Native AMP

Training results

Training Loss Epoch Step Validation Loss Precision Recall F1 Accuracy
No log 1.0 150 0.2817 0.1429 0.0829 0.1049 0.9470
No log 2.0 300 0.2222 0.3480 0.4822 0.4043 0.9480
No log 3.0 450 0.2170 0.3852 0.5736 0.4609 0.9480
0.5287 4.0 600 0.1919 0.4625 0.6261 0.5320 0.9558
0.5287 5.0 750 0.1701 0.5254 0.7174 0.6066 0.9627
0.5287 6.0 900 0.2060 0.5173 0.7327 0.6064 0.9565
0.0360 7.0 1050 0.2161 0.5370 0.7124 0.6124 0.9594
0.0360 8.0 1200 0.2146 0.5359 0.7445 0.6232 0.9584
0.0360 9.0 1350 0.2141 0.5268 0.7327 0.6129 0.9578
0.0147 10.0 1500 0.2131 0.5393 0.7310 0.6207 0.9597

Framework versions

  • Transformers 5.0.0
  • PyTorch 2.10.0+cu128
  • Datasets 4.0.0
  • Tokenizers 0.22.2
Downloads last month
188
Safetensors
Model size
0.1B params
Tensor type
F32
·
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for TomasFAV/Layoutlmv3InvoiceCzechV0

Finetuned
(297)
this model
Finetunes
1 model