zuminghuang commited on
Commit
0b82703
·
verified ·
1 Parent(s): afaf1a3

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +98 -4
README.md CHANGED
@@ -35,7 +35,101 @@ We are excited to release Infinity-Parser2-Pro, our latest flagship document und
35
 
36
  ## Quick Start
37
 
38
- ### Installation
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
39
 
40
  #### Pre-requisites
41
 
@@ -76,9 +170,9 @@ cd INF-MLLM/Infinity-Parser2
76
  pip install -e .
77
  ```
78
 
79
- ### Usage
80
 
81
- #### Command Line
82
 
83
  The `parser` command is the fastest way to get started.
84
 
@@ -109,7 +203,7 @@ parser demo_data/demo.png --task doc2md
109
  parser --help
110
  ```
111
 
112
- #### Python API
113
 
114
  ```python
115
  # NOTE: The Infinity-Parser2 model will be automatically downloaded on the first run.
 
35
 
36
  ## Quick Start
37
 
38
+ ### 1. Minimal "Hello World" (Native Transformers)
39
+
40
+ If you are looking for a minimal script to parse a single image to Markdown using the native `transformers` library, here is a simple snippet:
41
+
42
+ ```python
43
+ from PIL import Image
44
+ import torch
45
+ from transformers import AutoModelForImageTextToText, AutoProcessor
46
+ from qwen_vl_utils import process_vision_info
47
+
48
+ # Load the model and processor
49
+ model = AutoModelForImageTextToText.from_pretrained(
50
+ "infly/Infinity-Parser2-Pro",
51
+ torch_dtype="float16",
52
+ device_map="auto",
53
+ )
54
+ processor = AutoProcessor.from_pretrained("infly/Infinity-Parser2-Pro")
55
+
56
+ # Build the messages for the model
57
+ pil_image = Image.open("demo_data/demo.png").convert("RGB")
58
+ min_pixels = 2048 # 32 * 64
59
+ max_pixels = 16777216 # 4096 * 4096
60
+ prompt = """
61
+ Please output the layout information from the PDF image, including each layout element's bbox, its category, and the corresponding text content within the bbox.
62
+ 1. Bbox format: [x1, y1, x2, y2]
63
+ 2. Layout Categories: The possible categories are ['header', 'title', 'text', 'figure', 'table', 'formula', 'figure_caption', 'table_caption', 'formula_caption', 'figure_footnote', 'table_footnote', 'page_footnote', 'footer'].
64
+ 3. Text Extraction & Formatting Rules:
65
+ - Figure: For the 'figure' category, the text field should be empty string.
66
+ - Formula: Format its text as LaTeX.
67
+ - Table: Format its text as HTML.
68
+ - All Others (Text, Title, etc.): Format their text as Markdown.
69
+ 4. Constraints:
70
+ - The output text must be the original text from the image, with no translation.
71
+ - All layout elements must be sorted according to human reading order.
72
+ 5. Final Output: The entire output must be a single JSON object.
73
+ """
74
+
75
+ messages = [
76
+ {
77
+ "role": "user",
78
+ "content": [
79
+ {
80
+ "type": "image",
81
+ "image": pil_image,
82
+ "min_pixels": min_pixels,
83
+ "max_pixels": max_pixels,
84
+ },
85
+ {"type": "text", "text": prompt},
86
+ ],
87
+ }
88
+ ]
89
+
90
+ chat_template_kwargs = {"enable_thinking": False}
91
+
92
+ text = processor.apply_chat_template(
93
+ messages, tokenize=False, add_generation_prompt=True, **chat_template_kwargs
94
+ )
95
+ image_inputs, _ = process_vision_info(messages, image_patch_size=16)
96
+
97
+ inputs = processor(
98
+ text=text,
99
+ images=image_inputs,
100
+ do_resize=False,
101
+ padding=True,
102
+ return_tensors="pt",
103
+ )
104
+
105
+ # Move all tensors to the same device as the model
106
+ inputs = {
107
+ k: v.to(model.device) if isinstance(v, torch.Tensor) else v
108
+ for k, v in inputs.items()
109
+ }
110
+
111
+ # Generate the response
112
+ generated_ids = model.generate(
113
+ **inputs,
114
+ max_new_tokens=32768,
115
+ temperature=0.0,
116
+ top_p=1.0,
117
+ )
118
+
119
+ # Strip input tokens, keeping only the newly generated response
120
+ generated_ids_trimmed = [
121
+ out_ids[len(in_ids) :]
122
+ for in_ids, out_ids in zip(inputs["input_ids"], generated_ids)
123
+ ]
124
+ output_text = processor.batch_decode(
125
+ generated_ids_trimmed, skip_special_tokens=True, clean_up_tokenization_spaces=False
126
+ )
127
+ print(output_text)
128
+ ```
129
+
130
+ ### 2. Advanced Pipeline (infinity_parser2)
131
+
132
+ For bulk processing, advanced features, or an end-to-end PDF parsing pipeline, we recommend using our infinity_parser2 wrapper.
133
 
134
  #### Pre-requisites
135
 
 
170
  pip install -e .
171
  ```
172
 
173
+ #### Usage
174
 
175
+ ##### Command Line
176
 
177
  The `parser` command is the fastest way to get started.
178
 
 
203
  parser --help
204
  ```
205
 
206
+ ##### Python API
207
 
208
  ```python
209
  # NOTE: The Infinity-Parser2 model will be automatically downloaded on the first run.