TomasFAV commited on
Commit
07ca27c
·
verified ·
1 Parent(s): 6d94ce6

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +85 -17
README.md CHANGED
@@ -4,34 +4,99 @@ license: apache-2.0
4
  base_model: google/pix2struct-docvqa-base
5
  tags:
6
  - generated_from_trainer
 
 
 
 
 
 
 
7
  metrics:
8
  - f1
9
  model-index:
10
- - name: Pix2StructCzechInvoice
11
  results: []
12
  ---
13
 
14
- <!-- This model card has been generated automatically according to the information the Trainer had access to. You
15
- should probably proofread and complete it, then remove this comment. -->
16
 
17
- # Pix2StructCzechInvoice
18
 
19
- This model is a fine-tuned version of [google/pix2struct-docvqa-base](https://huggingface.co/google/pix2struct-docvqa-base) on an unknown dataset.
20
  It achieves the following results on the evaluation set:
21
- - Loss: 0.5022
22
- - F1: 0.5907
 
 
23
 
24
  ## Model description
25
 
26
- More information needed
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
27
 
28
- ## Intended uses & limitations
29
 
30
- More information needed
 
 
 
 
 
 
 
31
 
32
- ## Training and evaluation data
 
 
 
 
 
 
 
33
 
34
- More information needed
 
 
 
 
 
 
 
 
35
 
36
  ## Training procedure
37
 
@@ -48,6 +113,8 @@ The following hyperparameters were used during training:
48
  - num_epochs: 10
49
  - mixed_precision_training: Native AMP
50
 
 
 
51
  ### Training results
52
 
53
  | Training Loss | Epoch | Step | Validation Loss | F1 |
@@ -63,10 +130,11 @@ The following hyperparameters were used during training:
63
  | 0.1020 | 9.0 | 2700 | 0.4066 | 0.4294 |
64
  | 0.0842 | 10.0 | 3000 | 0.5022 | 0.4665 |
65
 
 
66
 
67
- ### Framework versions
68
 
69
- - Transformers 5.0.0
70
- - Pytorch 2.10.0+cu128
71
- - Datasets 4.0.0
72
- - Tokenizers 0.22.2
 
4
  base_model: google/pix2struct-docvqa-base
5
  tags:
6
  - generated_from_trainer
7
+ - invoice-processing
8
+ - information-extraction
9
+ - czech-language
10
+ - document-ai
11
+ - multimodal-model
12
+ - generative-model
13
+ - synthetic-data
14
  metrics:
15
  - f1
16
  model-index:
17
+ - name: Pix2StructCzechInvoice-V0
18
  results: []
19
  ---
20
 
21
+ # Pix2StructCzechInvoice (V0 Synthetic Templates Only)
 
22
 
23
+ This model is a fine-tuned version of [google/pix2struct-docvqa-base](https://huggingface.co/google/pix2struct-docvqa-base) for structured information extraction from Czech invoices.
24
 
 
25
  It achieves the following results on the evaluation set:
26
+ - Loss: 0.5022
27
+ - F1: 0.5907
28
+
29
+ ---
30
 
31
  ## Model description
32
 
33
+ Pix2StructCzechInvoice (V0) is a generative multimodal model designed for document understanding.
34
+
35
+ Unlike token classification models (e.g., BERT, LiLT, LayoutLMv3), this model:
36
+ - processes the entire document image
37
+ - generates structured outputs as text sequences
38
+
39
+ The model is trained to extract key invoice fields such as:
40
+ - supplier
41
+ - customer
42
+ - invoice number
43
+ - bank details
44
+ - totals
45
+ - dates
46
+
47
+ ---
48
+
49
+ ## Training data
50
+
51
+ The dataset consists of:
52
+
53
+ - synthetically generated invoice images
54
+ - fixed template layouts
55
+ - corresponding target text sequences representing structured fields
56
+
57
+ Key properties:
58
+ - clean and consistent visual structure
59
+ - no OCR noise (end-to-end image input)
60
+ - controlled output formatting
61
+ - no real-world documents
62
+
63
+ This represents the **baseline dataset for generative multimodal models**.
64
+
65
+ ---
66
+
67
+ ## Role in the pipeline
68
+
69
+ This model corresponds to:
70
 
71
+ **V0 Synthetic template-based dataset only**
72
 
73
+ It is used to:
74
+ - establish a baseline for generative document models
75
+ - compare with:
76
+ - token classification approaches (BERT, LiLT)
77
+ - multimodal encoders (LayoutLMv3)
78
+ - evaluate feasibility of end-to-end extraction
79
+
80
+ ---
81
 
82
+ ## Intended uses
83
+
84
+ - End-to-end invoice information extraction from images
85
+ - Document VQA-style tasks
86
+ - Research in generative document understanding
87
+ - Comparison with structured prediction approaches
88
+
89
+ ---
90
 
91
+ ## Limitations
92
+
93
+ - Trained only on synthetic data
94
+ - Sensitive to output formatting inconsistencies
95
+ - Lower stability compared to token classification models
96
+ - Requires careful evaluation (string matching vs structured metrics)
97
+ - Performance depends on generation quality
98
+
99
+ ---
100
 
101
  ## Training procedure
102
 
 
113
  - num_epochs: 10
114
  - mixed_precision_training: Native AMP
115
 
116
+ ---
117
+
118
  ### Training results
119
 
120
  | Training Loss | Epoch | Step | Validation Loss | F1 |
 
130
  | 0.1020 | 9.0 | 2700 | 0.4066 | 0.4294 |
131
  | 0.0842 | 10.0 | 3000 | 0.5022 | 0.4665 |
132
 
133
+ ---
134
 
135
+ ## Framework versions
136
 
137
+ - Transformers 5.0.0
138
+ - PyTorch 2.10.0+cu128
139
+ - Datasets 4.0.0
140
+ - Tokenizers 0.22.2