TomasFAV commited on
Commit
2e2ad47
·
verified ·
1 Parent(s): 7cff657

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +97 -17
README.md CHANGED
@@ -1,35 +1,112 @@
1
  ---
2
  library_name: transformers
 
 
3
  tags:
4
  - generated_from_trainer
 
 
 
 
 
 
 
 
5
  metrics:
6
  - f1
7
  model-index:
8
- - name: Pix2StructCzechInvoiceV2
9
  results: []
10
  ---
11
 
12
- <!-- This model card has been generated automatically according to the information the Trainer had access to. You
13
- should probably proofread and complete it, then remove this comment. -->
14
 
15
- # Pix2StructCzechInvoiceV2
16
 
17
- This model was trained from scratch on an unknown dataset.
18
  It achieves the following results on the evaluation set:
19
- - Loss: 0.2521
20
- - F1: 0.7311
 
 
21
 
22
  ## Model description
23
 
24
- More information needed
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
25
 
26
- ## Intended uses & limitations
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
27
 
28
- More information needed
29
 
30
- ## Training and evaluation data
31
 
32
- More information needed
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
33
 
34
  ## Training procedure
35
 
@@ -46,6 +123,8 @@ The following hyperparameters were used during training:
46
  - num_epochs: 10
47
  - mixed_precision_training: Native AMP
48
 
 
 
49
  ### Training results
50
 
51
  | Training Loss | Epoch | Step | Validation Loss | F1 |
@@ -61,10 +140,11 @@ The following hyperparameters were used during training:
61
  | 0.0861 | 9.0 | 1035 | 0.3019 | 0.6931 |
62
  | 0.0860 | 10.0 | 1150 | 0.3167 | 0.7186 |
63
 
 
64
 
65
- ### Framework versions
66
 
67
- - Transformers 5.0.0
68
- - Pytorch 2.10.0+cu128
69
- - Datasets 4.0.0
70
- - Tokenizers 0.22.2
 
1
  ---
2
  library_name: transformers
3
+ license: apache-2.0
4
+ base_model: google/pix2struct-docvqa-base
5
  tags:
6
  - generated_from_trainer
7
+ - invoice-processing
8
+ - information-extraction
9
+ - czech-language
10
+ - document-ai
11
+ - multimodal-model
12
+ - generative-model
13
+ - synthetic-data
14
+ - hybrid-data
15
  metrics:
16
  - f1
17
  model-index:
18
+ - name: Pix2StructCzechInvoice-V2
19
  results: []
20
  ---
21
 
22
+ # Pix2StructCzechInvoice (V2 Synthetic + Random Layout + Real Layout Injection)
 
23
 
24
+ This model is a fine-tuned version of [google/pix2struct-docvqa-base](https://huggingface.co/google/pix2struct-docvqa-base) for structured information extraction from Czech invoices.
25
 
 
26
  It achieves the following results on the evaluation set:
27
+ - Loss: 0.2521
28
+ - F1: 0.7311
29
+
30
+ ---
31
 
32
  ## Model description
33
 
34
+ Pix2StructCzechInvoice (V2) represents an advanced stage of the generative document understanding pipeline.
35
+
36
+ The model:
37
+ - processes full document images
38
+ - generates structured outputs as text sequences
39
+
40
+ It is trained to extract key invoice fields:
41
+ - supplier
42
+ - customer
43
+ - invoice number
44
+ - bank details
45
+ - totals
46
+ - dates
47
+
48
+ This version introduces **real layout injection**, significantly improving visual realism and model generalization.
49
+
50
+ ---
51
+
52
+ ## Training data
53
 
54
+ The dataset consists of three components:
55
+
56
+ 1. **Synthetic template-based invoices**
57
+ 2. **Synthetic invoices with randomized layouts**
58
+ 3. **Hybrid invoices with real layouts and synthetic content**
59
+
60
+ ### Real layout injection
61
+
62
+ In the hybrid dataset:
63
+ - real invoice layouts are used as templates
64
+ - original content is replaced with synthetic data
65
+ - new content is rendered into realistic visual structures
66
+
67
+ This preserves:
68
+ - real-world layout complexity
69
+ - visual patterns and formatting
70
+ - document structure variability
71
+
72
+ while maintaining:
73
+ - full control over annotations
74
+ - consistent output format
75
+
76
+ ---
77
 
78
+ ## Role in the pipeline
79
 
80
+ This model corresponds to:
81
 
82
+ **V2 Synthetic + layout augmentation + real layout injection**
83
+
84
+ It is used to:
85
+ - reduce the domain gap between synthetic and real documents
86
+ - evaluate the effect of realistic layouts on generative models
87
+ - compare with:
88
+ - V0–V1 (synthetic-only training)
89
+ - V3 (real data fine-tuning)
90
+
91
+ ---
92
+
93
+ ## Intended uses
94
+
95
+ - End-to-end invoice extraction from images
96
+ - Document VQA-style tasks
97
+ - Research in generative document understanding
98
+ - Evaluation of hybrid training strategies
99
+
100
+ ---
101
+
102
+ ## Limitations
103
+
104
+ - Generated outputs may contain formatting errors
105
+ - Sensitive to decoding strategy and tokenization
106
+ - Still lacks full exposure to real linguistic variability
107
+ - Training remains less stable than classification-based models
108
+
109
+ ---
110
 
111
  ## Training procedure
112
 
 
123
  - num_epochs: 10
124
  - mixed_precision_training: Native AMP
125
 
126
+ ---
127
+
128
  ### Training results
129
 
130
  | Training Loss | Epoch | Step | Validation Loss | F1 |
 
140
  | 0.0861 | 9.0 | 1035 | 0.3019 | 0.6931 |
141
  | 0.0860 | 10.0 | 1150 | 0.3167 | 0.7186 |
142
 
143
+ ---
144
 
145
+ ## Framework versions
146
 
147
+ - Transformers 5.0.0
148
+ - PyTorch 2.10.0+cu128
149
+ - Datasets 4.0.0
150
+ - Tokenizers 0.22.2