DonutInvoiceCzech (V0 – Synthetic Templates Only)

This model is a fine-tuned version of naver-clova-ix/donut-base-finetuned-cord-v2 for structured information extraction from Czech invoices.

It achieves the following results on the evaluation set:

  • Loss: 0.7067
  • Mean Accuracy: 0.8065
  • F1: 0.7111

Model description

DonutInvoiceCzech (V0) is a generative, OCR-free document understanding model.

Unlike traditional approaches, Donut:

  • processes raw document images
  • directly generates structured outputs
  • does not rely on external OCR

The model is trained to extract key invoice fields:

  • supplier
  • customer
  • invoice number
  • bank details
  • totals
  • dates

Training data

The dataset consists of:

  • synthetically generated invoice images
  • fixed template layouts
  • corresponding structured output sequences

Key properties:

  • clean visual structure
  • consistent formatting
  • no OCR noise
  • fully synthetic data

This represents the baseline dataset for OCR-free generative models.


Role in the pipeline

This model corresponds to:

V0 – Synthetic template-based dataset only

It is used to:

  • establish a baseline for OCR-free document models
  • compare with:
    • Pix2Struct (generative multimodal)
    • LayoutLMv3 (multimodal encoder)
    • BERT / LiLT (token classification)
  • evaluate end-to-end extraction without OCR

Intended uses

  • OCR-free invoice information extraction
  • End-to-end document understanding
  • Research in generative document models
  • Comparison of OCR-based vs OCR-free approaches

Limitations

  • Trained only on synthetic data
  • Sensitive to output formatting
  • No exposure to real-world noise or distortions
  • Less stable training compared to classification models
  • Requires structured decoding and post-processing

Training procedure

Training hyperparameters

The following hyperparameters were used during training:

  • learning_rate: 9e-05
  • train_batch_size: 4
  • eval_batch_size: 1
  • seed: 42
  • optimizer: Use OptimizerNames.ADAMW_TORCH_FUSED with betas=(0.9,0.999) and epsilon=1e-08 and optimizer_args=No additional optimizer arguments
  • lr_scheduler_type: linear
  • num_epochs: 10
  • mixed_precision_training: Native AMP

Training results

Training Loss Epoch Step Validation Loss Mean Accuracy F1
0.1161 1.0 300 0.5199 0.7558 0.6336
0.0957 2.0 600 0.5722 0.7535 0.6315
0.0420 3.0 900 0.6364 0.7699 0.6161
0.0364 4.0 1200 0.6706 0.7884 0.6190
0.0359 5.0 1500 0.6054 0.8083 0.6714
0.0207 6.0 1800 0.6145 0.8005 0.6839
0.0074 7.0 2100 0.7067 0.8065 0.7111
0.0017 8.0 2400 0.7292 0.8022 0.6886
0.0025 9.0 2700 0.7598 0.7889 0.6706
0.0004 10.0 3000 0.7759 0.7947 0.6824

Framework versions

  • Transformers 5.0.0
  • PyTorch 2.10.0+cu128
  • Datasets 4.0.0
  • Tokenizers 0.22.2
Downloads last month
781
Safetensors
Model size
0.2B params
Tensor type
I64
·
F32
·
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for TomasFAV/DonutInvoiceCzechV0

Finetuned
(40)
this model