LayoutLMv3InvoiceCzech (V1 – Synthetic + Random Layout)

This model is a fine-tuned version of microsoft/layoutlmv3-base for structured information extraction from Czech invoices.

It achieves the following results on the evaluation set:

  • Loss: 0.1750
  • Precision: 0.6800
  • Recall: 0.6904
  • F1: 0.6851
  • Accuracy: 0.9714

Model description

LayoutLMv3InvoiceCzech (V1) extends the baseline multimodal model by introducing layout variability into the training data.

The model leverages:

  • textual features
  • spatial layout (bounding boxes)
  • visual features (image embeddings)

It performs token-level classification to extract structured invoice fields:

  • supplier
  • customer
  • invoice number
  • bank details
  • totals
  • dates

Compared to V0, this version is trained on synthetically generated invoices with randomized layouts, improving robustness to structural variations.


Training data

The dataset consists of:

  • synthetically generated invoices based on templates
  • augmented variants with randomized layouts
  • corresponding bounding boxes
  • rendered document images

Key properties:

  • variable positioning of fields
  • layout perturbations (shifts, spacing, ordering)
  • preserved label consistency
  • fully synthetic data

This dataset introduces layout diversity and tests how multimodal models respond to structural variability.


Role in the pipeline

This model corresponds to:

V1 – Synthetic templates + randomized layouts

It is used to:

  • evaluate the impact of layout variability on multimodal models
  • compare against:
    • V0 (fixed layouts)
    • later hybrid and real-data stages (V2, V3)
  • analyze interaction between visual and spatial features

Intended uses

  • Research in multimodal document understanding
  • Benchmarking LayoutLMv3 under layout variability
  • Comparison with BERT and LiLT
  • Czech invoice information extraction

Limitations

  • Still trained only on synthetic data
  • Layout variability is artificial
  • Visual features are derived from clean renderings
  • No real-world noise (OCR errors, scanning artifacts)

Training procedure

Training hyperparameters

The following hyperparameters were used during training:

  • learning_rate: 1e-05
  • train_batch_size: 8
  • eval_batch_size: 1
  • seed: 42
  • optimizer: Use OptimizerNames.ADAMW_TORCH_FUSED with betas=(0.9,0.999) and epsilon=1e-08 and optimizer_args=No additional optimizer arguments
  • lr_scheduler_type: linear
  • lr_scheduler_warmup_steps: 0.1
  • num_epochs: 10
  • mixed_precision_training: Native AMP

Training results

Training Loss Epoch Step Validation Loss Precision Recall F1 Accuracy
No log 1.0 75 0.1545 0.6769 0.6701 0.6735 0.9711
No log 2.0 150 0.1658 0.6732 0.6937 0.6833 0.9695
No log 3.0 225 0.1750 0.6800 0.6904 0.6851 0.9714
No log 4.0 300 0.1946 0.6881 0.6159 0.6500 0.9707
No log 5.0 375 0.1896 0.6941 0.6717 0.6827 0.9717
No log 6.0 450 0.1979 0.6609 0.6430 0.6518 0.9704
0.0193 7.0 525 0.1991 0.6702 0.6396 0.6545 0.9706
0.0193 8.0 600 0.2014 0.6503 0.6261 0.6379 0.9698
0.0193 9.0 675 0.1955 0.6523 0.6413 0.6468 0.9702
0.0193 10.0 750 0.1956 0.6535 0.6447 0.6491 0.9704

Framework versions

  • Transformers 5.0.0
  • PyTorch 2.10.0+cu128
  • Datasets 4.0.0
  • Tokenizers 0.22.2
Downloads last month
447
Safetensors
Model size
0.1B params
Tensor type
F32
·
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for TomasFAV/Layoutlmv3InvoiceCzechV01

Finetuned
(297)
this model
Finetunes
1 model