vaheandonians commited on
Commit
dc60307
·
verified ·
1 Parent(s): 5b20b8b

Upload model from source account

Browse files
.gitattributes CHANGED
@@ -33,3 +33,4 @@ saved_model/**/* filter=lfs diff=lfs merge=lfs -text
33
  *.zip filter=lfs diff=lfs merge=lfs -text
34
  *.zst filter=lfs diff=lfs merge=lfs -text
35
  *tfevents* filter=lfs diff=lfs merge=lfs -text
 
 
33
  *.zip filter=lfs diff=lfs merge=lfs -text
34
  *.zst filter=lfs diff=lfs merge=lfs -text
35
  *tfevents* filter=lfs diff=lfs merge=lfs -text
36
+ Recognition/tokenizer.json filter=lfs diff=lfs merge=lfs -text
README.md ADDED
@@ -0,0 +1,1239 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ ---
2
+ pipeline_tag: image-text-to-text
3
+ library_name: monkeyocr
4
+ language:
5
+ - zh
6
+ - en
7
+ tags:
8
+ - OCR
9
+ ---
10
+ <div align="center" xmlns="http://www.w3.org/1999/html">
11
+ <h1 align="center">
12
+ MonkeyOCR: Document Parsing with a Structure-Recognition-Relation Triplet Paradigm
13
+ </h1>
14
+
15
+ [![arXiv](https://img.shields.io/badge/Arxiv-MonkeyOCR-b31b1b.svg?logo=arXiv)](https://arxiv.org/abs/2506.05218)
16
+ [![HuggingFace](https://img.shields.io/badge/HuggingFace%20Weights-black.svg?logo=HuggingFace)](https://huggingface.co/echo840/MonkeyOCR)
17
+ [![GitHub issues](https://img.shields.io/github/issues/Yuliang-Liu/MonkeyOCR?color=critical&label=Issues)](https://github.com/Yuliang-Liu/MonkeyOCR/issues?q=is%3Aopen+is%3Aissue)
18
+ [![GitHub closed issues](https://img.shields.io/github/issues-closed/Yuliang-Liu/MonkeyOCR?color=success&label=Issues)](https://github.com/Yuliang-Liu/MonkeyOCR/issues?q=is%3Aissue+is%3Aclosed)
19
+ [![License](https://img.shields.io/badge/License-Apache%202.0-yellow)](https://github.com/Yuliang-Liu/MonkeyOCR/blob/main/LICENSE.txt)
20
+ [![GitHub views](https://komarev.com/ghpvc/?username=Yuliang-Liu&repo=MonkeyOCR&color=brightgreen&label=Views)](https://github.com/Yuliang-Liu/MonkeyOCR)
21
+ </div>
22
+
23
+
24
+ > **MonkeyOCR: Document Parsing with a Structure-Recognition-Relation Triplet Paradigm**<br>
25
+ > Zhang Li, Yuliang Liu, Qiang Liu, Zhiyin Ma, Ziyang Zhang, Shuo Zhang, Zidun Guo, Jiarui Zhang, Xinyu Wang, Xiang Bai <br>
26
+ [![arXiv](https://img.shields.io/badge/Arxiv-b31b1b.svg?logo=arXiv)](https://arxiv.org/abs/2506.05218)
27
+ [![Source_code](https://img.shields.io/badge/Code-Available-white)](README.md)
28
+ [![Model Weight](https://img.shields.io/badge/HuggingFace-gray)](https://huggingface.co/echo840/MonkeyOCR)
29
+ [![Model Weight](https://img.shields.io/badge/ModelScope-green)](https://modelscope.cn/models/l1731396519/MonkeyOCR)
30
+ [![Public Courses](https://img.shields.io/badge/Openbayes-yellow)](https://openbayes.com/console/public/tutorials/91ESrGvEvBq)
31
+ [![Demo](https://img.shields.io/badge/Demo-blue)](http://vlrlabmonkey.xyz:7685/)
32
+
33
+
34
+
35
+ ## Introduction
36
+ MonkeyOCR adopts a Structure-Recognition-Relation (SRR) triplet paradigm, which simplifies the multi-tool pipeline of modular approaches while avoiding the inefficiency of using large multimodal models for full-page document processing.
37
+
38
+ 1. MonkeyOCR-pro-1.2B surpasses MonkeyOCR-3B by 7.4% on Chinese documents.
39
+ 2. MonkeyOCR-pro-1.2B delivers approximately a 36% speed improvement over MonkeyOCR-pro-3B, with approximately 1.6% drop in performance.
40
+ 3. On olmOCR-Bench, MonkeyOCR-pro-1.2B outperforms Nanonets-OCR-3B by 7.3%.
41
+ 4. On OmniDocBench, MonkeyOCR-pro-3B achieves the best overall performance on both English and Chinese documents, outperforming even closed-source and extra-large open-source VLMs such as Gemini 2.0-Flash, Gemini 2.5-Pro, Qwen2.5-VL-72B, GPT-4o, and InternVL3-78B.
42
+
43
+ See detailed results below.
44
+
45
+ ### Comparing MonkeyOCR with closed-source and extra large open-source VLMs.
46
+ <a href="https://zimgs.com/i/EKhkhY"><img src="https://v1.ax1x.com/2025/07/15/EKhkhY.png" alt="EKhkhY.png" border="0" /></a>
47
+
48
+ ## Inference Speed (Pages/s) on Different GPUs and [PDF](https://drive.google.com/drive/folders/1geumlJmVY7UUKdr8324sYZ0FHSAElh7m?usp=sharing) Page Counts
49
+
50
+ <table>
51
+ <thead>
52
+ <tr align='center'>
53
+ <th>Model</th>
54
+ <th>GPU</th>
55
+ <th>50 Pages</th>
56
+ <th>100 Pages</th>
57
+ <th>300 Pages</th>
58
+ <th>500 Pages</th>
59
+ <th>1000 Pages</th>
60
+ </tr>
61
+ </thead>
62
+ <tbody>
63
+ <tr align='center'>
64
+ <td rowspan='4'>MonkeyOCR-pro-3B</td>
65
+ <td>3090</td>
66
+ <td>0.492</td>
67
+ <td>0.484</td>
68
+ <td>0.497</td>
69
+ <td>0.492</td>
70
+ <td>0.496</td>
71
+ </tr>
72
+ <tr align='center'>
73
+ <td>A6000</td>
74
+ <td>0.585</td>
75
+ <td>0.587</td>
76
+ <td>0.609</td>
77
+ <td>0.598</td>
78
+ <td>0.608</td>
79
+ </tr>
80
+ <tr align='center'>
81
+ <td>H800</td>
82
+ <td>0.923</td>
83
+ <td>0.768</td>
84
+ <td>0.897</td>
85
+ <td>0.930</td>
86
+ <td>0.891</td>
87
+ </tr>
88
+ <tr align='center'>
89
+ <td>4090</td>
90
+ <td>0.972</td>
91
+ <td>0.969</td>
92
+ <td>1.006</td>
93
+ <td>0.986</td>
94
+ <td>1.006</td>
95
+ </tr>
96
+ <tr align='center'>
97
+ <td rowspan='4'>MonkeyOCR-pro-1.2B</td>
98
+ <td>3090</td>
99
+ <td>0.615</td>
100
+ <td>0.660</td>
101
+ <td>0.677</td>
102
+ <td>0.687</td>
103
+ <td>0.683</td>
104
+ </tr>
105
+ <tr align='center'>
106
+ <td>A6000</td>
107
+ <td>0.709</td>
108
+ <td>0.786</td>
109
+ <td>0.825</td>
110
+ <td>0.829</td>
111
+ <td>0.825</td>
112
+ </tr>
113
+ <tr align='center'>
114
+ <td>H800</td>
115
+ <td>0.965</td>
116
+ <td>1.082</td>
117
+ <td>1.101</td>
118
+ <td>1.145</td>
119
+ <td>1.015</td>
120
+ </tr>
121
+ <tr align='center'>
122
+ <td>4090</td>
123
+ <td>1.194</td>
124
+ <td>1.314</td>
125
+ <td>1.436</td>
126
+ <td>1.442</td>
127
+ <td>1.434</td>
128
+ </tr>
129
+ </tbody>
130
+ </table>
131
+
132
+ ## VLM OCR Speed (Pages/s) on Different GPUs and [PDF](https://drive.google.com/drive/folders/1geumlJmVY7UUKdr8324sYZ0FHSAElh7m?usp=sharing) Page Counts
133
+
134
+ <table>
135
+ <thead>
136
+ <tr align='center'>
137
+ <th>Model</th>
138
+ <th>GPU</th>
139
+ <th>50 Pages</th>
140
+ <th>100 Pages</th>
141
+ <th>300 Pages</th>
142
+ <th>500 Pages</th>
143
+ <th>1000 Pages</th>
144
+ </tr>
145
+ </thead>
146
+ <tbody>
147
+ <tr align='center'>
148
+ <td rowspan='4'>MonkeyOCR-pro-3B</td>
149
+ <td>3090</td>
150
+ <td>0.705</td>
151
+ <td>0.680</td>
152
+ <td>0.711</td>
153
+ <td>0.700</td>
154
+ <td>0.724</td>
155
+ </tr>
156
+ <tr align='center'>
157
+ <td>A6000</td>
158
+ <td>0.885</td>
159
+ <td>0.860</td>
160
+ <td>0.915</td>
161
+ <td>0.892</td>
162
+ <td>0.934</td>
163
+ </tr>
164
+ <tr align='center'>
165
+ <td>H800</td>
166
+ <td>1.371</td>
167
+ <td>1.135</td>
168
+ <td>1.339</td>
169
+ <td>1.433</td>
170
+ <td>1.509</td>
171
+ </tr>
172
+ <tr align='center'>
173
+ <td>4090</td>
174
+ <td>1.321</td>
175
+ <td>1.300</td>
176
+ <td>1.384</td>
177
+ <td>1.343</td>
178
+ <td>1.410</td>
179
+ </tr>
180
+ <tr align='center'>
181
+ <td rowspan='4'>MonkeyOCR-pro-1.2B</td>
182
+ <td>3090</td>
183
+ <td>0.919</td>
184
+ <td>1.086</td>
185
+ <td>1.166</td>
186
+ <td>1.182</td>
187
+ <td>1.199</td>
188
+ </tr>
189
+ <tr align='center'>
190
+ <td>A6000</td>
191
+ <td>1.177</td>
192
+ <td>1.361</td>
193
+ <td>1.506</td>
194
+ <td>1.525</td>
195
+ <td>1.569</td>
196
+ </tr>
197
+ <tr align='center'>
198
+ <td>H800</td>
199
+ <td>1.466</td>
200
+ <td>1.719</td>
201
+ <td>1.763</td>
202
+ <td>1.875</td>
203
+ <td>1.650</td>
204
+ </tr>
205
+ <tr align='center'>
206
+ <td>4090</td>
207
+ <td>1.759</td>
208
+ <td>1.987</td>
209
+ <td>2.260</td>
210
+ <td>2.345</td>
211
+ <td>2.415</td>
212
+ </tr>
213
+ </tbody>
214
+ </table>
215
+
216
+
217
+ ## Supported Hardware
218
+ Due to the limited types of GPUs available to us, we may not be able to provide highly accurate hardware specifications. We've tested the model on GPUs such as the 3090, 4090, A6000, H800, A100, and even the 4060 with 8GB of VRAM (suitable for deploying quantized 3B model and 1.2B model). We are very grateful for the feedback and contributions from the open-source community, who have also successfully run the model on [50-series GPUs](https://github.com/Yuliang-Liu/MonkeyOCR/issues/90), [H200](https://github.com/Yuliang-Liu/MonkeyOCR/issues/151), [L20](https://github.com/Yuliang-Liu/MonkeyOCR/issues/133), [V100](https://github.com/Yuliang-Liu/MonkeyOCR/issues/144), [2080 Ti](https://github.com/Yuliang-Liu/MonkeyOCR/pull/1) and [npu](https://github.com/Yuliang-Liu/MonkeyOCR/pull/226/files).
219
+
220
+
221
+ ## News
222
+ * ```2025.07.10 ``` 🚀 We release [MonkeyOCR-pro-1.2B](https://huggingface.co/echo840/MonkeyOCR-pro-1.2B), — a leaner and faster version model that outperforms our previous 3B version in accuracy, speed, and efficiency.
223
+ * ```2025.06.12 ``` 🚀 The model’s trending on [Hugging Face](https://huggingface.co/models?sort=trending). Thanks for the love!
224
+ * ```2025.06.05 ``` 🚀 We release [MonkeyOCR](https://huggingface.co/echo840/MonkeyOCR), an English and Chinese documents parsing model.
225
+
226
+
227
+ # Quick Start
228
+ ## Locally Install
229
+ ### 1. Install MonkeyOCR
230
+ See the [installation guide](https://github.com/Yuliang-Liu/MonkeyOCR/blob/main/docs/install_cuda_pp.md#install-with-cuda-support) to set up your environment.
231
+ ### 2. Download Model Weights
232
+ Download our model from Huggingface.
233
+ ```python
234
+ pip install huggingface_hub
235
+
236
+ python tools/download_model.py -n MonkeyOCR-pro-3B # or MonkeyOCR
237
+ ```
238
+ You can also download our model from ModelScope.
239
+
240
+ ```python
241
+ pip install modelscope
242
+
243
+ python tools/download_model.py -t modelscope -n MonkeyOCR-pro-3B # or MonkeyOCR
244
+ ```
245
+ ### 3. Inference
246
+ You can parse a file or a directory containing PDFs or images using the following commands:
247
+ ```bash
248
+ # Replace input_path with the path to a PDF or image or directory
249
+
250
+ # End-to-end parsing
251
+ python parse.py input_path
252
+
253
+ # Parse files in a dir with specific group page num
254
+ python parse.py input_path -g 20
255
+
256
+ # Single-task recognition (outputs markdown only)
257
+ python parse.py input_path -t text/formula/table
258
+
259
+ # Parse PDFs in input_path and split results by pages
260
+ python parse.py input_path -s
261
+
262
+ # Specify output directory and model config file
263
+ python parse.py input_path -o ./output -c config.yaml
264
+ ```
265
+
266
+ <details>
267
+ <summary><b>More usage examples</b></summary>
268
+
269
+ ```bash
270
+ # Single file processing
271
+ python parse.py input.pdf # Parse single PDF file
272
+ python parse.py input.pdf -o ./output # Parse with custom output dir
273
+ python parse.py input.pdf -s # Parse PDF with page splitting
274
+ python parse.py image.jpg # Parse single image file
275
+
276
+ # Single task recognition
277
+ python parse.py image.jpg -t text # Text recognition from image
278
+ python parse.py image.jpg -t formula # Formula recognition from image
279
+ python parse.py image.jpg -t table # Table recognition from image
280
+ python parse.py document.pdf -t text # Text recognition from all PDF pages
281
+
282
+ # Folder processing (all files individually)
283
+ python parse.py /path/to/folder # Parse all files in folder
284
+ python parse.py /path/to/folder -s # Parse with page splitting
285
+ python parse.py /path/to/folder -t text # Single task recognition for all files
286
+
287
+ # Multi-file grouping (batch processing by page count)
288
+ python parse.py /path/to/folder -g 5 # Group files with max 5 total pages
289
+ python parse.py /path/to/folder -g 10 -s # Group files with page splitting
290
+ python parse.py /path/to/folder -g 8 -t text # Group files for single task recognition
291
+
292
+ # Advanced configurations
293
+ python parse.py input.pdf -c model_configs.yaml # Custom model configuration
294
+ python parse.py /path/to/folder -g 15 -s -o ./out # Group files, split pages, custom output
295
+ python parse.py input.pdf --pred-abandon # Enable predicting abandon elements
296
+ python parse.py /path/to/folder -g 10 -m # Group files and merge text blocks in output
297
+ ```
298
+
299
+ </details>
300
+
301
+ <details>
302
+ <summary><b>Output Results</b></summary>
303
+
304
+ MonkeyOCR mainly generates three types of output files:
305
+
306
+ 1. **Processed Markdown File** (`your.md`): The final parsed document content in markdown format, containing text, formulas, tables, and other structured elements.
307
+ 2. **Layout Results** (`your_layout.pdf`): The layout results drawed on origin PDF.
308
+ 2. **Intermediate Block Results** (`your_middle.json`): A JSON file containing detailed information about all detected blocks, including:
309
+ - Block coordinates and positions
310
+ - Block content and type information
311
+ - Relationship information between blocks
312
+
313
+ These files provide both the final formatted output and detailed intermediate results for further analysis or processing.
314
+
315
+ </details>
316
+
317
+ ### 4. Gradio Demo
318
+ ```bash
319
+ python demo/demo_gradio.py
320
+ ```
321
+ Once the demo is running, you can access it at http://localhost:7860.
322
+
323
+ ### 5. Fast API
324
+ You can start the MonkeyOCR FastAPI service with the following command:
325
+ ```bash
326
+ uvicorn api.main:app --port 8000
327
+ ```
328
+ Once the API service is running, you can access the API documentation at http://localhost:8000/docs to explore available endpoints.
329
+ > [!TIP]
330
+ > To improve API concurrency performance, consider configuring the inference backend as `lmdeploy_queue` or `vllm_queue`.
331
+
332
+ ## Docker Deployment
333
+
334
+ 1. Navigate to the `docker` directory:
335
+
336
+ ```bash
337
+ cd docker
338
+ ```
339
+
340
+ 2. **Prerequisite:** Ensure NVIDIA GPU support is available in Docker (via `nvidia-docker2`).
341
+ If GPU support is not enabled, run the following to set up the environment:
342
+
343
+ ```bash
344
+ bash env.sh
345
+ ```
346
+
347
+ 3. Build the Docker image:
348
+
349
+ ```bash
350
+ docker compose build monkeyocr
351
+ ```
352
+
353
+ > [!IMPORTANT]
354
+ >
355
+ > If your GPU is from the 20/30/40-series, V100, L20/L40 or similar, please build the patched Docker image for LMDeploy compatibility:
356
+ >
357
+ > ```bash
358
+ > docker compose build monkeyocr-fix
359
+ > ```
360
+ >
361
+ > Otherwise, you may encounter the following error: `triton.runtime.errors.OutOfResources: out of resource: shared memory`
362
+
363
+ 4. Run the container with the Gradio demo (accessible on port 7860):
364
+
365
+ ```bash
366
+ docker compose up monkeyocr-demo
367
+ ```
368
+
369
+ Alternatively, start an interactive development environment:
370
+
371
+ ```bash
372
+ docker compose run --rm monkeyocr-dev
373
+ ```
374
+
375
+ 5. Run the FastAPI service (accessible on port 7861):
376
+ ```bash
377
+ docker compose up monkeyocr-api
378
+ ```
379
+ Once the API service is running, you can access the API documentation at http://localhost:7861/docs to explore available endpoints.
380
+
381
+ ## Windows Support
382
+
383
+ See the [windows support guide](docs/windows_support.md) for details.
384
+
385
+ ## Quantization
386
+
387
+ This model can be quantized using AWQ. Follow the instructions in the [quantization guide](docs/Quantization.md).
388
+
389
+ ## Benchmark Results
390
+
391
+ Here are the evaluation results of our model on OmniDocBench. MonkeyOCR-3B uses DocLayoutYOLO as the structure detection model, while MonkeyOCR-3B* uses our trained structure detection model with improved Chinese performance.
392
+
393
+ ### 1. The end-to-end evaluation results of different tasks.
394
+
395
+ <table>
396
+ <thead>
397
+ <tr>
398
+ <th rowspan="2"><strong>Model<br>Type</strong></th>
399
+ <th rowspan="2"><strong>Methods</strong></th>
400
+ <th colspan="2"><strong>Overall<sup>Edit</sup>↓</strong></th>
401
+ <th colspan="2"><strong>Text<sup>Edit</sup>↓</strong></th>
402
+ <th colspan="2"><strong>Formula<sup>Edit</sup>↓</strong></th>
403
+ <th colspan="2"><strong>Table<sup>TEDS</sup>↑</strong></th>
404
+ <th colspan="2"><strong>Table<sup>Edit</sup>↓</strong></th>
405
+ <th colspan="2"><strong>Read Order<sup>Edit</sup>↓</strong></th>
406
+ </tr>
407
+ <tr>
408
+ <th><em>EN</em></th>
409
+ <th><em>ZH</em></th>
410
+ <th><em>EN</em></th>
411
+ <th><em>ZH</em></th>
412
+ <th><em>EN</em></th>
413
+ <th><em>ZH</em></th>
414
+ <th><em>EN</em></th>
415
+ <th><em>ZH</em></th>
416
+ <th><em>EN</em></th>
417
+ <th><em>ZH</em></th>
418
+ <th><em>EN</em></th>
419
+ <th><em>ZH</em></th>
420
+ </tr>
421
+ </thead>
422
+ <tbody>
423
+ <tr>
424
+ <td rowspan="8"><strong>Pipeline<br>Tools</strong></td>
425
+ <td>MinerU</td>
426
+ <td>0.150</td>
427
+ <td>0.357</td>
428
+ <td>0.061</td>
429
+ <td>0.215</td>
430
+ <td>0.278</td>
431
+ <td>0.577</td>
432
+ <td>78.6</td>
433
+ <td>62.1</td>
434
+ <td>0.180</td>
435
+ <td>0.344</td>
436
+ <td>0.079</td>
437
+ <td>0.292</td>
438
+ </tr>
439
+ <tr>
440
+ <td>Marker</td>
441
+ <td>0.336</td>
442
+ <td>0.556</td>
443
+ <td>0.080</td>
444
+ <td>0.315</td>
445
+ <td>0.530</td>
446
+ <td>0.883</td>
447
+ <td>67.6</td>
448
+ <td>49.2</td>
449
+ <td>0.619</td>
450
+ <td>0.685</td>
451
+ <td>0.114</td>
452
+ <td>0.340</td>
453
+ </tr>
454
+ <tr>
455
+ <td>Mathpix</td>
456
+ <td>0.191</td>
457
+ <td>0.365</td>
458
+ <td>0.105</td>
459
+ <td>0.384</td>
460
+ <td>0.306</td>
461
+ <td><strong>0.454</strong></td>
462
+ <td>77.0</td>
463
+ <td>67.1</td>
464
+ <td>0.243</td>
465
+ <td>0.320</td>
466
+ <td>0.108</td>
467
+ <td>0.304</td>
468
+ </tr>
469
+ <tr>
470
+ <td>Docling</td>
471
+ <td>0.589</td>
472
+ <td>0.909</td>
473
+ <td>0.416</td>
474
+ <td>0.987</td>
475
+ <td>0.999</td>
476
+ <td>1</td>
477
+ <td>61.3</td>
478
+ <td>25.0</td>
479
+ <td>0.627</td>
480
+ <td>0.810</td>
481
+ <td>0.313</td>
482
+ <td>0.837</td>
483
+ </tr>
484
+ <tr>
485
+ <td>Pix2Text</td>
486
+ <td>0.320</td>
487
+ <td>0.528</td>
488
+ <td>0.138</td>
489
+ <td>0.356</td>
490
+ <td>0.276</td>
491
+ <td>0.611</td>
492
+ <td>73.6</td>
493
+ <td>66.2</td>
494
+ <td>0.584</td>
495
+ <td>0.645</td>
496
+ <td>0.281</td>
497
+ <td>0.499</td>
498
+ </tr>
499
+ <tr>
500
+ <td>Unstructured</td>
501
+ <td>0.586</td>
502
+ <td>0.716</td>
503
+ <td>0.198</td>
504
+ <td>0.481</td>
505
+ <td>0.999</td>
506
+ <td>1</td>
507
+ <td>0</td>
508
+ <td>0.06</td>
509
+ <td>1</td>
510
+ <td>0.998</td>
511
+ <td>0.145</td>
512
+ <td>0.387</td>
513
+ </tr>
514
+ <tr>
515
+ <td>OpenParse</td>
516
+ <td>0.646</td>
517
+ <td>0.814</td>
518
+ <td>0.681</td>
519
+ <td>0.974</td>
520
+ <td>0.996</td>
521
+ <td>1</td>
522
+ <td>64.8</td>
523
+ <td>27.5</td>
524
+ <td>0.284</td>
525
+ <td>0.639</td>
526
+ <td>0.595</td>
527
+ <td>0.641</td>
528
+ </tr>
529
+ <tr>
530
+ <td>PP-StructureV3</td>
531
+ <td>0.145</td>
532
+ <td><strong>0.206</strong></td>
533
+ <td>0.058</td>
534
+ <td><strong>0.088</strong></td>
535
+ <td>0.295</td>
536
+ <td>0.535</td>
537
+ <td>-</td>
538
+ <td>-</td>
539
+ <td>0.159</td>
540
+ <td><strong>0.109</strong></td>
541
+ <td><strong>0.069</strong></td>
542
+ <td><strong>0.091</strong></td>
543
+ </tr>
544
+ <tr>
545
+ <td rowspan="8"><strong>Expert<br>VLMs</strong></td>
546
+ <td>GOT-OCR</td>
547
+ <td>0.287</td>
548
+ <td>0.411</td>
549
+ <td>0.189</td>
550
+ <td>0.315</td>
551
+ <td>0.360</td>
552
+ <td>0.528</td>
553
+ <td>53.2</td>
554
+ <td>47.2</td>
555
+ <td>0.459</td>
556
+ <td>0.520</td>
557
+ <td>0.141</td>
558
+ <td>0.280</td>
559
+ </tr>
560
+ <tr>
561
+ <td>Nougat</td>
562
+ <td>0.452</td>
563
+ <td>0.973</td>
564
+ <td>0.365</td>
565
+ <td>0.998</td>
566
+ <td>0.488</td>
567
+ <td>0.941</td>
568
+ <td>39.9</td>
569
+ <td>0</td>
570
+ <td>0.572</td>
571
+ <td>1.000</td>
572
+ <td>0.382</td>
573
+ <td>0.954</td>
574
+ </tr>
575
+ <tr>
576
+ <td>Mistral OCR</td>
577
+ <td>0.268</td>
578
+ <td>0.439</td>
579
+ <td>0.072</td>
580
+ <td>0.325</td>
581
+ <td>0.318</td>
582
+ <td>0.495</td>
583
+ <td>75.8</td>
584
+ <td>63.6</td>
585
+ <td>0.600</td>
586
+ <td>0.650</td>
587
+ <td>0.083</td>
588
+ <td>0.284</td>
589
+ </tr>
590
+ <tr>
591
+ <td>OLMOCR-sglang</td>
592
+ <td>0.326</td>
593
+ <td>0.469</td>
594
+ <td>0.097</td>
595
+ <td>0.293</td>
596
+ <td>0.455</td>
597
+ <td>0.655</td>
598
+ <td>68.1</td>
599
+ <td>61.3</td>
600
+ <td>0.608<td>0.652</td>
601
+ <td>0.145</td>
602
+ <td>0.277</td>
603
+ </tr>
604
+ <tr>
605
+ <td>SmolDocling-256M</td>
606
+ <td>0.493</td>
607
+ <td>0.816</td>
608
+ <td>0.262</td>
609
+ <td>0.838</td>
610
+ <td>0.753</td>
611
+ <td>0.997</td>
612
+ <td>44.9</td>
613
+ <td>16.5</td>
614
+ <td>0.729</td>
615
+ <td>0.907</td>
616
+ <td>0.227</td>
617
+ <td>0.522</td>
618
+ </tr>
619
+ <tr>
620
+ <td>Dolphin</td>
621
+ <td>0.206</td>
622
+ <td>0.306</td>
623
+ <td>0.107</td>
624
+ <td>0.197</td>
625
+ <td>0.447</td>
626
+ <td>0.580</td>
627
+ <td>77.3</td>
628
+ <td>67.2</td>
629
+ <td>0.180</td>
630
+ <td>0.285</td>
631
+ <td>0.091</td>
632
+ <td>0.162</td>
633
+ </tr>
634
+ <tr>
635
+ <td>MinerU 2</td>
636
+ <td>0.139</td>
637
+ <td>0.240</td>
638
+ <td><strong>0.047</strong></td>
639
+ <td>0.109</td>
640
+ <td>0.297</td>
641
+ <td>0.536</td>
642
+ <td><strong>82.5</strong></td>
643
+ <td>79.0</td>
644
+ <td>0.141</td>
645
+ <td>0.195</td>
646
+ <td><strong>0.069</strong></td>
647
+ <td>0.118</td>
648
+ </tr>
649
+ <tr>
650
+ <td>OCRFlux</td>
651
+
652
+ <td>0.195</td>
653
+ <td>0.281</td>
654
+ <td>0.064</td>
655
+ <td>0.183</td>
656
+ <td>0.379</td>
657
+ <td>0.613</td>
658
+ <td>71.6</td>
659
+ <td>81.3</td>
660
+ <td>0.253</td>
661
+ <td>0.139</td>
662
+ <td>0.086</td>
663
+ <td>0.187</td>
664
+
665
+
666
+ </tr>
667
+ <tr>
668
+ <td rowspan="3"><strong>General<br>VLMs</strong></td>
669
+ <td>GPT4o</td>
670
+ <td>0.233</td>
671
+ <td>0.399</td>
672
+ <td>0.144</td>
673
+ <td>0.409</td>
674
+ <td>0.425</td>
675
+ <td>0.606</td>
676
+ <td>72.0</td>
677
+ <td>62.9</td>
678
+ <td>0.234</td>
679
+ <td>0.329</td>
680
+ <td>0.128</td>
681
+ <td>0.251</td>
682
+ </tr>
683
+ <tr>
684
+ <td>Qwen2.5-VL-7B</td>
685
+ <td>0.312</td>
686
+ <td>0.406</td>
687
+ <td>0.157</td>
688
+ <td>0.228</td>
689
+ <td>0.351</td>
690
+ <td>0.574</td>
691
+ <td>76.4</td>
692
+ <td>72.2</td>
693
+ <td>0.588</td>
694
+ <td>0.619</td>
695
+ <td>0.149</td>
696
+ <td>0.203</td>
697
+ </tr>
698
+ <tr>
699
+ <td>InternVL3-8B</td>
700
+ <td>0.314</td>
701
+ <td>0.383</td>
702
+ <td>0.134</td>
703
+ <td>0.218</td>
704
+ <td>0.417</td>
705
+ <td>0.563</td>
706
+ <td>66.1</td>
707
+ <td>73.1</td>
708
+ <td>0.586</td>
709
+ <td>0.564</td>
710
+ <td>0.118</td>
711
+ <td>0.186</td>
712
+ </tr>
713
+ <tr>
714
+ <td rowspan="4"><strong>Mix</strong></td>
715
+ <td><strong>MonkeyOCR-3B <a href="https://huggingface.co/echo840/MonkeyOCR/blob/main/Structure/doclayout_yolo_docstructbench_imgsz1280_2501.pt">[Weight]</a></strong></td>
716
+ <td>0.140</td>
717
+ <td>0.297</td>
718
+ <td>0.058</td>
719
+ <td>0.185</td>
720
+ <td>0.238</td>
721
+ <td>0.506</td>
722
+ <td>80.2</td>
723
+ <td>77.7</td>
724
+ <td>0.170</td>
725
+ <td>0.253</td>
726
+ <td>0.093</td>
727
+ <td>0.244</td>
728
+ </tr>
729
+ <tr>
730
+ <td><strong>MonkeyOCR-3B* <a href="https://huggingface.co/echo840/MonkeyOCR/blob/main/Structure/layout_zh.pt">[Weight]</a></strong></td>
731
+ <td>0.154</td>
732
+ <td>0.277</td>
733
+ <td>0.073</td>
734
+ <td>0.134</td>
735
+ <td>0.255</td>
736
+ <td>0.529</td>
737
+ <td>78.2</td>
738
+ <td>76.2</td>
739
+ <td>0.182</td>
740
+ <td>0.262</td>
741
+ <td>0.105</td>
742
+ <td>0.183</td>
743
+ </tr>
744
+ <tr>
745
+ <td><strong>MonkeyOCR-pro-3B <a href="https://huggingface.co/echo840/MonkeyOCR-pro-3B">[Weight]</a></strong></td>
746
+ <td><strong>0.138</strong></td>
747
+ <td><strong>0.206</strong></td>
748
+ <td>0.067</td>
749
+ <td>0.107</td>
750
+ <td><strong>0.246</strong></td>
751
+ <td><strong>0.421</strong></td>
752
+ <td>81.5</td>
753
+ <td><strong>87.5</strong></td>
754
+ <td><strong>0.139</strong></td>
755
+ <td>0.111</td>
756
+ <td>0.100</td>
757
+ <td>0.185</td>
758
+ </tr>
759
+ <tr>
760
+ <td><strong>MonkeyOCR-pro-1.2B <a href="https://huggingface.co/echo840/MonkeyOCR-pro-1.2B">[Weight]</a></strong></td>
761
+ <td>0.153</td>
762
+ <td>0.223</td>
763
+ <td>0.066</td>
764
+ <td>0.123</td>
765
+ <td>0.272</td>
766
+ <td>0.449</td>
767
+ <td>76.5</td>
768
+ <td>83.7</td>
769
+ <td>0.176</td>
770
+ <td>0.131</td>
771
+ <td>0.097</td>
772
+ <td>0.187</td>
773
+ </tr>
774
+ </tbody>
775
+ </table>
776
+
777
+
778
+ ### 2. The end-to-end text recognition performance across 9 PDF page types.
779
+
780
+ <table>
781
+ <thead>
782
+ <tr>
783
+ <th><strong>Model<br>Type</strong></th>
784
+ <th><strong>Models</strong></th>
785
+ <th><strong>Book</strong></th>
786
+ <th><strong>Slides</strong></th>
787
+ <th><strong>Financial<br>Report</strong></th>
788
+ <th><strong>Textbook</strong></th>
789
+ <th><strong>Exam<br>Paper</strong></th>
790
+ <th><strong>Magazine</strong></th>
791
+ <th><strong>Academic<br>Papers</strong></th>
792
+ <th><strong>Notes</strong></th>
793
+ <th><strong>Newspaper</strong></th>
794
+ <th><strong>Overall</strong></th>
795
+ </tr>
796
+ </thead>
797
+ <tbody>
798
+ <tr>
799
+ <td rowspan="3"><strong>Pipeline<br>Tools</strong></td>
800
+ <td>MinerU</td>
801
+ <td>0.055</td>
802
+ <td>0.124</td>
803
+ <td><u>0.033</u></td>
804
+ <td>0.102</td>
805
+ <td>0.159</td>
806
+ <td><strong>0.072</strong></td>
807
+ <td><u>0.025</u></td>
808
+ <td>0.984</td>
809
+ <td>0.171</td>
810
+ <td>0.206</td>
811
+ </tr>
812
+ <tr>
813
+ <td>Marker</td>
814
+ <td>0.074</td>
815
+ <td>0.340</td>
816
+ <td>0.089</td>
817
+ <td>0.319</td>
818
+ <td>0.452</td>
819
+ <td>0.153</td>
820
+ <td>0.059</td>
821
+ <td>0.651</td>
822
+ <td>0.192</td>
823
+ <td>0.274</td>
824
+ </tr>
825
+ <tr>
826
+ <td>Mathpix</td>
827
+ <td>0.131</td>
828
+ <td>0.220</td>
829
+ <td>0.202</td>
830
+ <td>0.216</td>
831
+ <td>0.278</td>
832
+ <td>0.147</td>
833
+ <td>0.091</td>
834
+ <td>0.634</td>
835
+ <td>0.690</td>
836
+ <td>0.300</td>
837
+ </tr>
838
+ <tr>
839
+ <td rowspan="4"><strong>Expert<br>VLMs</strong></td>
840
+ <td>GOT-OCR</td>
841
+ <td>0.111</td>
842
+ <td>0.222</td>
843
+ <td>0.067</td>
844
+ <td>0.132</td>
845
+ <td>0.204</td>
846
+ <td>0.198</td>
847
+ <td>0.179</td>
848
+ <td>0.388</td>
849
+ <td>0.771</td>
850
+ <td>0.267</td>
851
+ </tr>
852
+ <tr>
853
+ <td>Nougat</td>
854
+ <td>0.734</td>
855
+ <td>0.958</td>
856
+ <td>1.000</td>
857
+ <td>0.820</td>
858
+ <td>0.930</td>
859
+ <td>0.830</td>
860
+ <td>0.214</td>
861
+ <td>0.991</td>
862
+ <td>0.871</td>
863
+ <td>0.806</td>
864
+ </tr>
865
+ <tr>
866
+ <td>Dolphin</td>
867
+ <td>0.091</td>
868
+ <td>0.131</td>
869
+ <td>0.057</td>
870
+ <td>0.146</td>
871
+ <td>0.231</td>
872
+ <td>0.121</td>
873
+ <td>0.074</td>
874
+ <td>0.363</td>
875
+ <td>0.307</td>
876
+ <td>0.177</td>
877
+ </tr>
878
+ <tr>
879
+ <td>OCRFlux</td>
880
+ <td>0.068</td>
881
+ <td>0.125</td>
882
+ <td>0.092</td>
883
+ <td>0.102</td>
884
+ <td>0.119</td>
885
+ <td>0.083</td>
886
+ <td>0.047</td>
887
+ <td>0.223</td>
888
+ <td>0.536</td>
889
+ <td>0.149</td>
890
+ </tr>
891
+ <tr>
892
+ <td rowspan="3"><strong>General<br>VLMs</strong></td>
893
+ <td>GPT4o</td>
894
+ <td>0.157</td>
895
+ <td>0.163</td>
896
+ <td>0.348</td>
897
+ <td>0.187</td>
898
+ <td>0.281</td>
899
+ <td>0.173</td>
900
+ <td>0.146</td>
901
+ <td>0.607</td>
902
+ <td>0.751</td>
903
+ <td>0.316</td>
904
+ </tr>
905
+ <tr>
906
+ <td>Qwen2.5-VL-7B</td>
907
+ <td>0.148</td>
908
+ <td><strong>0.053</strong></td>
909
+ <td>0.111</td>
910
+ <td>0.137</td>
911
+ <td>0.189</td>
912
+ <td>0.117</td>
913
+ <td>0.134</td>
914
+ <td>0.204</td>
915
+ <td>0.706</td>
916
+ <td>0.205</td>
917
+ </tr>
918
+ <tr>
919
+ <td>InternVL3-8B</td>
920
+ <td>0.163</td>
921
+ <td><u>0.056</u></td>
922
+ <td>0.107</td>
923
+ <td>0.109</td>
924
+ <td>0.129</td>
925
+ <td>0.100</td>
926
+ <td>0.159</td>
927
+ <td><strong>0.150</strong></td>
928
+ <td>0.681</td>
929
+ <td>0.188</td>
930
+ </tr>
931
+ <tr>
932
+ <td rowspan="4"><strong>Mix</strong></td>
933
+ <td><strong>MonkeyOCR-3B <a href="https://huggingface.co/echo840/MonkeyOCR/blob/main/Structure/doclayout_yolo_docstructbench_imgsz1280_2501.pt">[Weight]</a></strong></td>
934
+ <td><strong>0.046</strong></td>
935
+ <td>0.120</td>
936
+ <td><strong>0.024</strong></td>
937
+ <td>0.100</td>
938
+ <td>0.129</td>
939
+ <td>0.086</td>
940
+ <td><strong>0.024</strong></td>
941
+ <td>0.643</td>
942
+ <td><u>0.131</u></td>
943
+ <td>0.155</td>
944
+ </tr>
945
+ <tr>
946
+ <td><strong>MonkeyOCR-3B* <a href="https://huggingface.co/echo840/MonkeyOCR/blob/main/Structure/layout_zh.pt">[Weight]</a></strong></td>
947
+ <td><u>0.054</u></td>
948
+ <td>0.203</td>
949
+ <td>0.038</td>
950
+ <td>0.112</td>
951
+ <td>0.138</td>
952
+ <td>0.111</td>
953
+ <td>0.032</td>
954
+ <td>0.194</td>
955
+ <td>0.136</td>
956
+ <td>0.120</td>
957
+ </tr>
958
+ <tr>
959
+ <td><strong>MonkeyOCR-pro-3B <a href="https://huggingface.co/echo840/MonkeyOCR-pro-3B">[Weight]</a></strong></td>
960
+ <td>0.084</td>
961
+ <td>0.129</td>
962
+ <td>0.060</td>
963
+ <td><strong>0.090</strong></td>
964
+ <td><strong>0.107</strong></td>
965
+ <td><u>0.073</u></td>
966
+ <td>0.050</td>
967
+ <td><u>0.171</u></td>
968
+ <td><strong>0.107</strong></td>
969
+ <td><strong>0.100</strong></td>
970
+ </tr>
971
+ <tr>
972
+ <td><strong>MonkeyOCR-pro-1.2B <a href="https://huggingface.co/echo840/MonkeyOCR-pro-1.2B">[Weight]</a></strong></td>
973
+ <td>0.087</td>
974
+ <td>0.142</td>
975
+ <td>0.059</td>
976
+ <td><u>0.093</u></td>
977
+ <td><u>0.115</u></td>
978
+ <td>0.085</td>
979
+ <td>0.045</td>
980
+ <td>0.226</td>
981
+ <td>0.122</td>
982
+ <td><u>0.112</u></td>
983
+ </tr>
984
+ </tbody>
985
+ </table>
986
+
987
+ ### 3. The evaluation results of olmOCR-bench.
988
+
989
+ <table>
990
+ <thead>
991
+ <tr>
992
+ <th>Model</th>
993
+ <th>ArXiv</th>
994
+ <th>Old Scans<br>Math</th>
995
+ <th>Tables</th>
996
+ <th>Old Scans</th>
997
+ <th>Headers and<br>Footers</th>
998
+ <th>Multi<br>column</th>
999
+ <th>Long Tiny<br>Text</th>
1000
+ <th>Base</th>
1001
+ <th>Overall</th>
1002
+ </tr>
1003
+ </thead>
1004
+ <tbody>
1005
+ <tr>
1006
+ <td>GOT OCR</td>
1007
+ <td>52.7</td>
1008
+ <td>52.0</td>
1009
+ <td>0.2</td>
1010
+ <td>22.1</td>
1011
+ <td>93.6</td>
1012
+ <td>42.0</td>
1013
+ <td>29.9</td>
1014
+ <td>94.0</td>
1015
+ <td>48.3 ± 1.1</td>
1016
+ </tr>
1017
+ <tr>
1018
+ <td>Marker</td>
1019
+ <td>76.0</td>
1020
+ <td>57.9</td>
1021
+ <td>57.6</td>
1022
+ <td>27.8</td>
1023
+ <td>84.9</td>
1024
+ <td>72.9</td>
1025
+ <td>84.6</td>
1026
+ <td><strong>99.1</strong></td>
1027
+ <td>70.1 ± 1.1</td>
1028
+ </tr>
1029
+ <tr>
1030
+ <td>MinerU</td>
1031
+ <td>75.4</td>
1032
+ <td>47.4</td>
1033
+ <td>60.9</td>
1034
+ <td>17.3</td>
1035
+ <td><strong>96.6</strong></td>
1036
+ <td>59.0</td>
1037
+ <td>39.1</td>
1038
+ <td>96.6</td>
1039
+ <td>61.5 ± 1.1</td>
1040
+ </tr>
1041
+ <tr>
1042
+ <td>Mistral OCR</td>
1043
+ <td>77.2</td>
1044
+ <td>67.5</td>
1045
+ <td>60.6</td>
1046
+ <td>29.3</td>
1047
+ <td>93.6</td>
1048
+ <td>71.3</td>
1049
+ <td>77.1</td>
1050
+ <td>99.4</td>
1051
+ <td>72.0 ± 1.1</td>
1052
+ </tr>
1053
+ <tr>
1054
+ <td>Nanonets OCR</td>
1055
+ <td>67.0</td>
1056
+ <td>68.6</td>
1057
+ <td><strong>77.7</strong></td>
1058
+ <td>39.5</td>
1059
+ <td>40.7</td>
1060
+ <td>69.9</td>
1061
+ <td>53.4</td>
1062
+ <td>99.3</td>
1063
+ <td>64.5 ± 1.1</td>
1064
+ </tr>
1065
+ <tr>
1066
+ <td>GPT-4o<br>(No Anchor)</td>
1067
+ <td>51.5</td>
1068
+ <td><strong>75.5</strong></td>
1069
+ <td>69.1</td>
1070
+ <td>40.9</td>
1071
+ <td>94.2</td>
1072
+ <td>68.9</td>
1073
+ <td>54.1</td>
1074
+ <td>96.7</td>
1075
+ <td>68.9 ± 1.1</td>
1076
+ </tr>
1077
+ <tr>
1078
+ <td>GPT-4o<br>(Anchored)</td>
1079
+ <td>53.5</td>
1080
+ <td>74.5</td>
1081
+ <td>70.0</td>
1082
+ <td>40.7</td>
1083
+ <td>93.8</td>
1084
+ <td>69.3</td>
1085
+ <td>60.6</td>
1086
+ <td>96.8</td>
1087
+ <td>69.9 ± 1.1</td>
1088
+ </tr>
1089
+ <tr>
1090
+ <td>Gemini Flash 2<br>(No Anchor)</td>
1091
+ <td>32.1</td>
1092
+ <td>56.3</td>
1093
+ <td>61.4</td>
1094
+ <td>27.8</td>
1095
+ <td>48.0</td>
1096
+ <td>58.7</td>
1097
+ <td><strong>84.4</strong></td>
1098
+ <td>94.0</td>
1099
+ <td>57.8 ± 1.1</td>
1100
+ </tr>
1101
+ <tr>
1102
+ <td>Gemini Flash 2<br>(Anchored)</td>
1103
+ <td>54.5</td>
1104
+ <td>56.1</td>
1105
+ <td>72.1</td>
1106
+ <td>34.2</td>
1107
+ <td>64.7</td>
1108
+ <td>61.5</td>
1109
+ <td>71.5</td>
1110
+ <td>95.6</td>
1111
+ <td>63.8 ± 1.2</td>
1112
+ </tr>
1113
+ <tr>
1114
+ <td>Qwen 2 VL<br>(No Anchor)</td>
1115
+ <td>19.7</td>
1116
+ <td>31.7</td>
1117
+ <td>24.2</td>
1118
+ <td>17.1</td>
1119
+ <td>88.9</td>
1120
+ <td>8.3</td>
1121
+ <td>6.8</td>
1122
+ <td>55.5</td>
1123
+ <td>31.5 ± 0.9</td>
1124
+ </tr>
1125
+ <tr>
1126
+ <td>Qwen 2.5 VL<br>(No Anchor)</td>
1127
+ <td>63.1</td>
1128
+ <td>65.7</td>
1129
+ <td>67.3</td>
1130
+ <td>38.6</td>
1131
+ <td>73.6</td>
1132
+ <td>68.3</td>
1133
+ <td>49.1</td>
1134
+ <td>98.3</td>
1135
+ <td>65.5 ± 1.2</td>
1136
+ </tr>
1137
+ <tr>
1138
+ <td>olmOCR v0.1.75<br>(No Anchor)</td>
1139
+ <td>71.5</td>
1140
+ <td>71.4</td>
1141
+ <td>71.4</td>
1142
+ <td><strong>42.8</strong></td>
1143
+ <td>94.1</td>
1144
+ <td>77.7</td>
1145
+ <td>71.0</td>
1146
+ <td>97.8</td>
1147
+ <td>74.7 ± 1.1</td>
1148
+ </tr>
1149
+ <tr>
1150
+ <td>olmOCR v0.1.75<br>(Anchored)</td>
1151
+ <td>74.9</td>
1152
+ <td>71.2</td>
1153
+ <td>71.0</td>
1154
+ <td>42.2</td>
1155
+ <td>94.5</td>
1156
+ <td><strong>78.3</strong></td>
1157
+ <td>73.3</td>
1158
+ <td>98.3</td>
1159
+ <td>75.5 ± 1.0</td>
1160
+ </tr>
1161
+ <tr>
1162
+ <td>MonkeyOCR-pro-3B <a href="https://huggingface.co/echo840/MonkeyOCR-pro-3B">[Weight]</a></td>
1163
+ <td><strong>83.8</strong></td>
1164
+ <td>68.8</td>
1165
+ <td>74.6</td>
1166
+ <td>36.1</td>
1167
+ <td>91.2</td>
1168
+ <td>76.6</td>
1169
+ <td>80.1</td>
1170
+ <td>95.3</td>
1171
+ <td><strong>75.8 ± 1.0</strong></td>
1172
+ </tr>
1173
+ <tr>
1174
+ <td>MonkeyOCR-pro-1.2B <a href="https://huggingface.co/echo840/MonkeyOCR-pro-1.2B">[Weight]</a></td>
1175
+ <td>80.5</td>
1176
+ <td>62.9</td>
1177
+ <td>71.1</td>
1178
+ <td>32.9</td>
1179
+ <td>92.2</td>
1180
+ <td>68.3</td>
1181
+ <td>74.0</td>
1182
+ <td>92.6</td>
1183
+ <td>71.8 ± 1.1</td>
1184
+ </tr>
1185
+ </tbody>
1186
+ </table>
1187
+
1188
+ ## Visualization Demo
1189
+
1190
+ Get a Quick Hands-On Experience with Our Demo: http://vlrlabmonkey.xyz:7685 (The latest model is available for selection)
1191
+
1192
+ > Our demo is simple and easy to use:
1193
+ >
1194
+ > 1. Upload a PDF or image.
1195
+ > 2. Click “Parse (解析)” to let the model perform structure detection, content recognition, and relationship prediction on the input document. The final output will be a markdown-formatted version of the document.
1196
+ > 3. Select a prompt and click “Test by prompt” to let the model perform content recognition on the image based on the selected prompt.
1197
+
1198
+
1199
+
1200
+
1201
+ ### Example for formula document
1202
+ <img src="https://v1.ax1x.com/2025/06/10/7jVLgB.jpg" alt="7jVLgB.jpg" border="0" />
1203
+
1204
+ ### Example for table document
1205
+ <img src="https://v1.ax1x.com/2025/06/11/7jcOaa.png" alt="7jcOaa.png" border="0" />
1206
+
1207
+ ### Example for newspaper
1208
+ <img src="https://v1.ax1x.com/2025/06/11/7jcP5V.png" alt="7jcP5V.png" border="0" />
1209
+
1210
+ ### Example for financial report
1211
+ <img src="https://v1.ax1x.com/2025/06/11/7jc10I.png" alt="7jc10I.png" border="0" />
1212
+ <img src="https://v1.ax1x.com/2025/06/11/7jcRCL.png" alt="7jcRCL.png" border="0" />
1213
+
1214
+ ## Citing MonkeyOCR
1215
+
1216
+ If you wish to refer to the baseline results published here, please use the following BibTeX entries:
1217
+
1218
+ ```BibTeX
1219
+ @misc{li2025monkeyocrdocumentparsingstructurerecognitionrelation,
1220
+ title={MonkeyOCR: Document Parsing with a Structure-Recognition-Relation Triplet Paradigm},
1221
+ author={Zhang Li and Yuliang Liu and Qiang Liu and Zhiyin Ma and Ziyang Zhang and Shuo Zhang and Zidun Guo and Jiarui Zhang and Xinyu Wang and Xiang Bai},
1222
+ year={2025},
1223
+ eprint={2506.05218},
1224
+ archivePrefix={arXiv},
1225
+ primaryClass={cs.CV},
1226
+ url={https://arxiv.org/abs/2506.05218},
1227
+ }
1228
+ ```
1229
+
1230
+
1231
+
1232
+ ## Acknowledgments
1233
+ We would like to thank [MinerU](https://github.com/opendatalab/MinerU), [DocLayout-YOLO](https://github.com/opendatalab/DocLayout-YOLO), [PyMuPDF](https://github.com/pymupdf/PyMuPDF), [layoutreader](https://github.com/ppaanngggg/layoutreader), [Qwen2.5-VL](https://github.com/QwenLM/Qwen2.5-VL), [LMDeploy](https://github.com/InternLM/lmdeploy), [PP-StructureV3](https://github.com/PaddlePaddle/PaddleOCR), [PP-DocLayout_plus-L](https://huggingface.co/PaddlePaddle/PP-DocLayout_plus-L), and [InternVL3](https://github.com/OpenGVLab/InternVL) for providing base code and models, as well as their contributions to this field. We also thank [M6Doc](https://github.com/HCIILAB/M6Doc), [DocLayNet](https://github.com/DS4SD/DocLayNet), [CDLA](https://github.com/buptlihang/CDLA), [D4LA](https://github.com/AlibabaResearch/AdvancedLiterateMachinery), [DocGenome](https://github.com/Alpha-Innovator/DocGenome), [PubTabNet](https://github.com/ibm-aur-nlp/PubTabNet), and [UniMER-1M](https://github.com/opendatalab/UniMERNet) for providing valuable datasets. We also thank everyone who contributed to this open-source effort.
1234
+
1235
+ ## Limitation
1236
+ Currently, MonkeyOCR do not yet fully support for photographed text, handwritten content, Traditional Chinese characters, or multilingual text. We plan to consider adding support for these features in future public releases. Additionally, our model is deployed on a single GPU, so if too many users upload files at the same time, issues like “This application is currently busy” may occur. The processing time shown on the demo page does not reflect computation time alone—it also includes result uploading and other overhead. During periods of high traffic, this time may be longer. The inference speeds of MonkeyOCR, MinerU, and Qwen2.5 VL-7B were measured on an H800 GPU.
1237
+
1238
+ ## Copyright
1239
+ Please don’t hesitate to share your valuable feedback — it’s a key motivation that drives us to continuously improve our framework. Note: Our model is intended for academic research and non-commercial use only. If you are interested in faster (smaller) or stronger one, please contact us at xbai@hust.edu.cn or ylliu@hust.edu.cn.
Recognition/added_tokens.json ADDED
@@ -0,0 +1,24 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "</tool_call>": 151658,
3
+ "<tool_call>": 151657,
4
+ "<|box_end|>": 151649,
5
+ "<|box_start|>": 151648,
6
+ "<|endoftext|>": 151643,
7
+ "<|file_sep|>": 151664,
8
+ "<|fim_middle|>": 151660,
9
+ "<|fim_pad|>": 151662,
10
+ "<|fim_prefix|>": 151659,
11
+ "<|fim_suffix|>": 151661,
12
+ "<|im_end|>": 151645,
13
+ "<|im_start|>": 151644,
14
+ "<|image_pad|>": 151655,
15
+ "<|object_ref_end|>": 151647,
16
+ "<|object_ref_start|>": 151646,
17
+ "<|quad_end|>": 151651,
18
+ "<|quad_start|>": 151650,
19
+ "<|repo_name|>": 151663,
20
+ "<|video_pad|>": 151656,
21
+ "<|vision_end|>": 151653,
22
+ "<|vision_pad|>": 151654,
23
+ "<|vision_start|>": 151652
24
+ }
Recognition/chat_template.json ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ {
2
+ "chat_template": "{% set image_count = namespace(value=0) %}{% set video_count = namespace(value=0) %}{% for message in messages %}{% if loop.first and message['role'] != 'system' %}<|im_start|>system\nYou are a helpful assistant.<|im_end|>\n{% endif %}<|im_start|>{{ message['role'] }}\n{% if message['content'] is string %}{{ message['content'] }}<|im_end|>\n{% else %}{% for content in message['content'] %}{% if content['type'] == 'image' or 'image' in content or 'image_url' in content %}{% set image_count.value = image_count.value + 1 %}{% if add_vision_id %}Picture {{ image_count.value }}: {% endif %}<|vision_start|><|image_pad|><|vision_end|>{% elif content['type'] == 'video' or 'video' in content %}{% set video_count.value = video_count.value + 1 %}{% if add_vision_id %}Video {{ video_count.value }}: {% endif %}<|vision_start|><|video_pad|><|vision_end|>{% elif 'text' in content %}{{ content['text'] }}{% endif %}{% endfor %}<|im_end|>\n{% endif %}{% endfor %}{% if add_generation_prompt %}<|im_start|>assistant\n{% endif %}"
3
+ }
Recognition/config.json ADDED
@@ -0,0 +1,51 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "_name_or_path": "MonkeyOCR-1.2B-0709",
3
+ "architectures": [
4
+ "Qwen2_5_VLForConditionalGeneration"
5
+ ],
6
+ "attention_dropout": 0.0,
7
+ "bos_token_id": 151643,
8
+ "eos_token_id": 151645,
9
+ "hidden_act": "silu",
10
+ "hidden_size": 2048,
11
+ "image_token_id": 151655,
12
+ "initializer_range": 0.02,
13
+ "intermediate_size": 11008,
14
+ "max_position_embeddings": 128000,
15
+ "max_window_layers": 70,
16
+ "model_type": "qwen2_5_vl",
17
+ "num_attention_heads": 16,
18
+ "num_hidden_layers": 12,
19
+ "num_key_value_heads": 2,
20
+ "rms_norm_eps": 1e-06,
21
+ "rope_scaling": {
22
+ "mrope_section": [
23
+ 16,
24
+ 24,
25
+ 24
26
+ ],
27
+ "rope_type": "default",
28
+ "type": "default"
29
+ },
30
+ "rope_theta": 1000000.0,
31
+ "sliding_window": 32768,
32
+ "tie_word_embeddings": true,
33
+ "torch_dtype": "bfloat16",
34
+ "transformers_version": "4.50.0.dev0",
35
+ "use_cache": false,
36
+ "use_sliding_window": false,
37
+ "video_token_id": 151656,
38
+ "vision_config": {
39
+ "hidden_size": 1280,
40
+ "in_chans": 3,
41
+ "model_type": "qwen2_5_vl",
42
+ "out_hidden_size": 2048,
43
+ "spatial_patch_size": 14,
44
+ "tokens_per_second": 2,
45
+ "torch_dtype": "bfloat16"
46
+ },
47
+ "vision_end_token_id": 151653,
48
+ "vision_start_token_id": 151652,
49
+ "vision_token_id": 151654,
50
+ "vocab_size": 151936
51
+ }
Recognition/generation_config.json ADDED
@@ -0,0 +1,14 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "bos_token_id": 151643,
3
+ "do_sample": true,
4
+ "eos_token_id": [
5
+ 151645,
6
+ 151643
7
+ ],
8
+ "pad_token_id": 151643,
9
+ "repetition_penalty": 1.05,
10
+ "temperature": 0.1,
11
+ "top_k": 1,
12
+ "top_p": 0.001,
13
+ "transformers_version": "4.50.0.dev0"
14
+ }
Recognition/merges.txt ADDED
The diff for this file is too large to render. See raw diff
 
Recognition/model.safetensors ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:bf7e21986c63a3646ca393ba36d2ea7706b7c8632c3d2de15f6e6569b671b39f
3
+ size 3809609344
Recognition/preprocessor_config.json ADDED
@@ -0,0 +1,29 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "do_convert_rgb": true,
3
+ "do_normalize": true,
4
+ "do_rescale": true,
5
+ "do_resize": true,
6
+ "image_mean": [
7
+ 0.48145466,
8
+ 0.4578275,
9
+ 0.40821073
10
+ ],
11
+ "image_processor_type": "Qwen2VLImageProcessor",
12
+ "image_std": [
13
+ 0.26862954,
14
+ 0.26130258,
15
+ 0.27577711
16
+ ],
17
+ "max_pixels": 12845056,
18
+ "merge_size": 2,
19
+ "min_pixels": 3136,
20
+ "patch_size": 14,
21
+ "processor_class": "Qwen2_5_VLProcessor",
22
+ "resample": 3,
23
+ "rescale_factor": 0.00392156862745098,
24
+ "size": {
25
+ "longest_edge": 12845056,
26
+ "shortest_edge": 3136
27
+ },
28
+ "temporal_patch_size": 2
29
+ }
Recognition/special_tokens_map.json ADDED
@@ -0,0 +1,31 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "additional_special_tokens": [
3
+ "<|im_start|>",
4
+ "<|im_end|>",
5
+ "<|object_ref_start|>",
6
+ "<|object_ref_end|>",
7
+ "<|box_start|>",
8
+ "<|box_end|>",
9
+ "<|quad_start|>",
10
+ "<|quad_end|>",
11
+ "<|vision_start|>",
12
+ "<|vision_end|>",
13
+ "<|vision_pad|>",
14
+ "<|image_pad|>",
15
+ "<|video_pad|>"
16
+ ],
17
+ "eos_token": {
18
+ "content": "<|im_end|>",
19
+ "lstrip": false,
20
+ "normalized": false,
21
+ "rstrip": false,
22
+ "single_word": false
23
+ },
24
+ "pad_token": {
25
+ "content": "<|endoftext|>",
26
+ "lstrip": false,
27
+ "normalized": false,
28
+ "rstrip": false,
29
+ "single_word": false
30
+ }
31
+ }
Recognition/tokenizer.json ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:9c5ae00e602b8860cbd784ba82a8aa14e8feecec692e7076590d014d7b7fdafa
3
+ size 11421896
Recognition/tokenizer_config.json ADDED
@@ -0,0 +1,210 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "add_bos_token": false,
3
+ "add_prefix_space": false,
4
+ "added_tokens_decoder": {
5
+ "151643": {
6
+ "content": "<|endoftext|>",
7
+ "lstrip": false,
8
+ "normalized": false,
9
+ "rstrip": false,
10
+ "single_word": false,
11
+ "special": true
12
+ },
13
+ "151644": {
14
+ "content": "<|im_start|>",
15
+ "lstrip": false,
16
+ "normalized": false,
17
+ "rstrip": false,
18
+ "single_word": false,
19
+ "special": true
20
+ },
21
+ "151645": {
22
+ "content": "<|im_end|>",
23
+ "lstrip": false,
24
+ "normalized": false,
25
+ "rstrip": false,
26
+ "single_word": false,
27
+ "special": true
28
+ },
29
+ "151646": {
30
+ "content": "<|object_ref_start|>",
31
+ "lstrip": false,
32
+ "normalized": false,
33
+ "rstrip": false,
34
+ "single_word": false,
35
+ "special": true
36
+ },
37
+ "151647": {
38
+ "content": "<|object_ref_end|>",
39
+ "lstrip": false,
40
+ "normalized": false,
41
+ "rstrip": false,
42
+ "single_word": false,
43
+ "special": true
44
+ },
45
+ "151648": {
46
+ "content": "<|box_start|>",
47
+ "lstrip": false,
48
+ "normalized": false,
49
+ "rstrip": false,
50
+ "single_word": false,
51
+ "special": true
52
+ },
53
+ "151649": {
54
+ "content": "<|box_end|>",
55
+ "lstrip": false,
56
+ "normalized": false,
57
+ "rstrip": false,
58
+ "single_word": false,
59
+ "special": true
60
+ },
61
+ "151650": {
62
+ "content": "<|quad_start|>",
63
+ "lstrip": false,
64
+ "normalized": false,
65
+ "rstrip": false,
66
+ "single_word": false,
67
+ "special": true
68
+ },
69
+ "151651": {
70
+ "content": "<|quad_end|>",
71
+ "lstrip": false,
72
+ "normalized": false,
73
+ "rstrip": false,
74
+ "single_word": false,
75
+ "special": true
76
+ },
77
+ "151652": {
78
+ "content": "<|vision_start|>",
79
+ "lstrip": false,
80
+ "normalized": false,
81
+ "rstrip": false,
82
+ "single_word": false,
83
+ "special": true
84
+ },
85
+ "151653": {
86
+ "content": "<|vision_end|>",
87
+ "lstrip": false,
88
+ "normalized": false,
89
+ "rstrip": false,
90
+ "single_word": false,
91
+ "special": true
92
+ },
93
+ "151654": {
94
+ "content": "<|vision_pad|>",
95
+ "lstrip": false,
96
+ "normalized": false,
97
+ "rstrip": false,
98
+ "single_word": false,
99
+ "special": true
100
+ },
101
+ "151655": {
102
+ "content": "<|image_pad|>",
103
+ "lstrip": false,
104
+ "normalized": false,
105
+ "rstrip": false,
106
+ "single_word": false,
107
+ "special": true
108
+ },
109
+ "151656": {
110
+ "content": "<|video_pad|>",
111
+ "lstrip": false,
112
+ "normalized": false,
113
+ "rstrip": false,
114
+ "single_word": false,
115
+ "special": true
116
+ },
117
+ "151657": {
118
+ "content": "<tool_call>",
119
+ "lstrip": false,
120
+ "normalized": false,
121
+ "rstrip": false,
122
+ "single_word": false,
123
+ "special": false
124
+ },
125
+ "151658": {
126
+ "content": "</tool_call>",
127
+ "lstrip": false,
128
+ "normalized": false,
129
+ "rstrip": false,
130
+ "single_word": false,
131
+ "special": false
132
+ },
133
+ "151659": {
134
+ "content": "<|fim_prefix|>",
135
+ "lstrip": false,
136
+ "normalized": false,
137
+ "rstrip": false,
138
+ "single_word": false,
139
+ "special": false
140
+ },
141
+ "151660": {
142
+ "content": "<|fim_middle|>",
143
+ "lstrip": false,
144
+ "normalized": false,
145
+ "rstrip": false,
146
+ "single_word": false,
147
+ "special": false
148
+ },
149
+ "151661": {
150
+ "content": "<|fim_suffix|>",
151
+ "lstrip": false,
152
+ "normalized": false,
153
+ "rstrip": false,
154
+ "single_word": false,
155
+ "special": false
156
+ },
157
+ "151662": {
158
+ "content": "<|fim_pad|>",
159
+ "lstrip": false,
160
+ "normalized": false,
161
+ "rstrip": false,
162
+ "single_word": false,
163
+ "special": false
164
+ },
165
+ "151663": {
166
+ "content": "<|repo_name|>",
167
+ "lstrip": false,
168
+ "normalized": false,
169
+ "rstrip": false,
170
+ "single_word": false,
171
+ "special": false
172
+ },
173
+ "151664": {
174
+ "content": "<|file_sep|>",
175
+ "lstrip": false,
176
+ "normalized": false,
177
+ "rstrip": false,
178
+ "single_word": false,
179
+ "special": false
180
+ }
181
+ },
182
+ "additional_special_tokens": [
183
+ "<|im_start|>",
184
+ "<|im_end|>",
185
+ "<|object_ref_start|>",
186
+ "<|object_ref_end|>",
187
+ "<|box_start|>",
188
+ "<|box_end|>",
189
+ "<|quad_start|>",
190
+ "<|quad_end|>",
191
+ "<|vision_start|>",
192
+ "<|vision_end|>",
193
+ "<|vision_pad|>",
194
+ "<|image_pad|>",
195
+ "<|video_pad|>"
196
+ ],
197
+ "bos_token": null,
198
+ "chat_template": "{%- if tools %}\n {{- '<|im_start|>system\\n' }}\n {%- if messages[0]['role'] == 'system' %}\n {{- messages[0]['content'] }}\n {%- else %}\n {{- 'You are a helpful assistant.' }}\n {%- endif %}\n {{- \"\\n\\n# Tools\\n\\nYou may call one or more functions to assist with the user query.\\n\\nYou are provided with function signatures within <tools></tools> XML tags:\\n<tools>\" }}\n {%- for tool in tools %}\n {{- \"\\n\" }}\n {{- tool | tojson }}\n {%- endfor %}\n {{- \"\\n</tools>\\n\\nFor each function call, return a json object with function name and arguments within <tool_call></tool_call> XML tags:\\n<tool_call>\\n{\\\"name\\\": <function-name>, \\\"arguments\\\": <args-json-object>}\\n</tool_call><|im_end|>\\n\" }}\n{%- else %}\n {%- if messages[0]['role'] == 'system' %}\n {{- '<|im_start|>system\\n' + messages[0]['content'] + '<|im_end|>\\n' }}\n {%- else %}\n {{- '<|im_start|>system\\nYou are a helpful assistant.<|im_end|>\\n' }}\n {%- endif %}\n{%- endif %}\n{%- for message in messages %}\n {%- if (message.role == \"user\") or (message.role == \"system\" and not loop.first) or (message.role == \"assistant\" and not message.tool_calls) %}\n {{- '<|im_start|>' + message.role + '\\n' + message.content + '<|im_end|>' + '\\n' }}\n {%- elif message.role == \"assistant\" %}\n {{- '<|im_start|>' + message.role }}\n {%- if message.content %}\n {{- '\\n' + message.content }}\n {%- endif %}\n {%- for tool_call in message.tool_calls %}\n {%- if tool_call.function is defined %}\n {%- set tool_call = tool_call.function %}\n {%- endif %}\n {{- '\\n<tool_call>\\n{\"name\": \"' }}\n {{- tool_call.name }}\n {{- '\", \"arguments\": ' }}\n {{- tool_call.arguments | tojson }}\n {{- '}\\n</tool_call>' }}\n {%- endfor %}\n {{- '<|im_end|>\\n' }}\n {%- elif message.role == \"tool\" %}\n {%- if (loop.index0 == 0) or (messages[loop.index0 - 1].role != \"tool\") %}\n {{- '<|im_start|>user' }}\n {%- endif %}\n {{- '\\n<tool_response>\\n' }}\n {{- message.content }}\n {{- '\\n</tool_response>' }}\n {%- if loop.last or (messages[loop.index0 + 1].role != \"tool\") %}\n {{- '<|im_end|>\\n' }}\n {%- endif %}\n {%- endif %}\n{%- endfor %}\n{%- if add_generation_prompt %}\n {{- '<|im_start|>assistant\\n' }}\n{%- endif %}\n",
199
+ "clean_up_tokenization_spaces": false,
200
+ "eos_token": "<|im_end|>",
201
+ "errors": "replace",
202
+ "extra_special_tokens": {},
203
+ "model_max_length": 8196,
204
+ "pad_token": "<|endoftext|>",
205
+ "padding_side": "right",
206
+ "processor_class": "Qwen2_5_VLProcessor",
207
+ "split_special_tokens": false,
208
+ "tokenizer_class": "Qwen2Tokenizer",
209
+ "unk_token": null
210
+ }
Recognition/vocab.json ADDED
The diff for this file is too large to render. See raw diff
 
Relation/config.json ADDED
@@ -0,0 +1,1063 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "architectures": [
3
+ "LayoutLMv3ForTokenClassification"
4
+ ],
5
+ "attention_probs_dropout_prob": 0.1,
6
+ "bos_token_id": 0,
7
+ "classifier_dropout": null,
8
+ "coordinate_size": 171,
9
+ "eos_token_id": 2,
10
+ "has_relative_attention_bias": true,
11
+ "has_spatial_attention_bias": true,
12
+ "hidden_act": "gelu",
13
+ "hidden_dropout_prob": 0.1,
14
+ "hidden_size": 1024,
15
+ "id2label": {
16
+ "0": "LABEL_0",
17
+ "1": "LABEL_1",
18
+ "2": "LABEL_2",
19
+ "3": "LABEL_3",
20
+ "4": "LABEL_4",
21
+ "5": "LABEL_5",
22
+ "6": "LABEL_6",
23
+ "7": "LABEL_7",
24
+ "8": "LABEL_8",
25
+ "9": "LABEL_9",
26
+ "10": "LABEL_10",
27
+ "11": "LABEL_11",
28
+ "12": "LABEL_12",
29
+ "13": "LABEL_13",
30
+ "14": "LABEL_14",
31
+ "15": "LABEL_15",
32
+ "16": "LABEL_16",
33
+ "17": "LABEL_17",
34
+ "18": "LABEL_18",
35
+ "19": "LABEL_19",
36
+ "20": "LABEL_20",
37
+ "21": "LABEL_21",
38
+ "22": "LABEL_22",
39
+ "23": "LABEL_23",
40
+ "24": "LABEL_24",
41
+ "25": "LABEL_25",
42
+ "26": "LABEL_26",
43
+ "27": "LABEL_27",
44
+ "28": "LABEL_28",
45
+ "29": "LABEL_29",
46
+ "30": "LABEL_30",
47
+ "31": "LABEL_31",
48
+ "32": "LABEL_32",
49
+ "33": "LABEL_33",
50
+ "34": "LABEL_34",
51
+ "35": "LABEL_35",
52
+ "36": "LABEL_36",
53
+ "37": "LABEL_37",
54
+ "38": "LABEL_38",
55
+ "39": "LABEL_39",
56
+ "40": "LABEL_40",
57
+ "41": "LABEL_41",
58
+ "42": "LABEL_42",
59
+ "43": "LABEL_43",
60
+ "44": "LABEL_44",
61
+ "45": "LABEL_45",
62
+ "46": "LABEL_46",
63
+ "47": "LABEL_47",
64
+ "48": "LABEL_48",
65
+ "49": "LABEL_49",
66
+ "50": "LABEL_50",
67
+ "51": "LABEL_51",
68
+ "52": "LABEL_52",
69
+ "53": "LABEL_53",
70
+ "54": "LABEL_54",
71
+ "55": "LABEL_55",
72
+ "56": "LABEL_56",
73
+ "57": "LABEL_57",
74
+ "58": "LABEL_58",
75
+ "59": "LABEL_59",
76
+ "60": "LABEL_60",
77
+ "61": "LABEL_61",
78
+ "62": "LABEL_62",
79
+ "63": "LABEL_63",
80
+ "64": "LABEL_64",
81
+ "65": "LABEL_65",
82
+ "66": "LABEL_66",
83
+ "67": "LABEL_67",
84
+ "68": "LABEL_68",
85
+ "69": "LABEL_69",
86
+ "70": "LABEL_70",
87
+ "71": "LABEL_71",
88
+ "72": "LABEL_72",
89
+ "73": "LABEL_73",
90
+ "74": "LABEL_74",
91
+ "75": "LABEL_75",
92
+ "76": "LABEL_76",
93
+ "77": "LABEL_77",
94
+ "78": "LABEL_78",
95
+ "79": "LABEL_79",
96
+ "80": "LABEL_80",
97
+ "81": "LABEL_81",
98
+ "82": "LABEL_82",
99
+ "83": "LABEL_83",
100
+ "84": "LABEL_84",
101
+ "85": "LABEL_85",
102
+ "86": "LABEL_86",
103
+ "87": "LABEL_87",
104
+ "88": "LABEL_88",
105
+ "89": "LABEL_89",
106
+ "90": "LABEL_90",
107
+ "91": "LABEL_91",
108
+ "92": "LABEL_92",
109
+ "93": "LABEL_93",
110
+ "94": "LABEL_94",
111
+ "95": "LABEL_95",
112
+ "96": "LABEL_96",
113
+ "97": "LABEL_97",
114
+ "98": "LABEL_98",
115
+ "99": "LABEL_99",
116
+ "100": "LABEL_100",
117
+ "101": "LABEL_101",
118
+ "102": "LABEL_102",
119
+ "103": "LABEL_103",
120
+ "104": "LABEL_104",
121
+ "105": "LABEL_105",
122
+ "106": "LABEL_106",
123
+ "107": "LABEL_107",
124
+ "108": "LABEL_108",
125
+ "109": "LABEL_109",
126
+ "110": "LABEL_110",
127
+ "111": "LABEL_111",
128
+ "112": "LABEL_112",
129
+ "113": "LABEL_113",
130
+ "114": "LABEL_114",
131
+ "115": "LABEL_115",
132
+ "116": "LABEL_116",
133
+ "117": "LABEL_117",
134
+ "118": "LABEL_118",
135
+ "119": "LABEL_119",
136
+ "120": "LABEL_120",
137
+ "121": "LABEL_121",
138
+ "122": "LABEL_122",
139
+ "123": "LABEL_123",
140
+ "124": "LABEL_124",
141
+ "125": "LABEL_125",
142
+ "126": "LABEL_126",
143
+ "127": "LABEL_127",
144
+ "128": "LABEL_128",
145
+ "129": "LABEL_129",
146
+ "130": "LABEL_130",
147
+ "131": "LABEL_131",
148
+ "132": "LABEL_132",
149
+ "133": "LABEL_133",
150
+ "134": "LABEL_134",
151
+ "135": "LABEL_135",
152
+ "136": "LABEL_136",
153
+ "137": "LABEL_137",
154
+ "138": "LABEL_138",
155
+ "139": "LABEL_139",
156
+ "140": "LABEL_140",
157
+ "141": "LABEL_141",
158
+ "142": "LABEL_142",
159
+ "143": "LABEL_143",
160
+ "144": "LABEL_144",
161
+ "145": "LABEL_145",
162
+ "146": "LABEL_146",
163
+ "147": "LABEL_147",
164
+ "148": "LABEL_148",
165
+ "149": "LABEL_149",
166
+ "150": "LABEL_150",
167
+ "151": "LABEL_151",
168
+ "152": "LABEL_152",
169
+ "153": "LABEL_153",
170
+ "154": "LABEL_154",
171
+ "155": "LABEL_155",
172
+ "156": "LABEL_156",
173
+ "157": "LABEL_157",
174
+ "158": "LABEL_158",
175
+ "159": "LABEL_159",
176
+ "160": "LABEL_160",
177
+ "161": "LABEL_161",
178
+ "162": "LABEL_162",
179
+ "163": "LABEL_163",
180
+ "164": "LABEL_164",
181
+ "165": "LABEL_165",
182
+ "166": "LABEL_166",
183
+ "167": "LABEL_167",
184
+ "168": "LABEL_168",
185
+ "169": "LABEL_169",
186
+ "170": "LABEL_170",
187
+ "171": "LABEL_171",
188
+ "172": "LABEL_172",
189
+ "173": "LABEL_173",
190
+ "174": "LABEL_174",
191
+ "175": "LABEL_175",
192
+ "176": "LABEL_176",
193
+ "177": "LABEL_177",
194
+ "178": "LABEL_178",
195
+ "179": "LABEL_179",
196
+ "180": "LABEL_180",
197
+ "181": "LABEL_181",
198
+ "182": "LABEL_182",
199
+ "183": "LABEL_183",
200
+ "184": "LABEL_184",
201
+ "185": "LABEL_185",
202
+ "186": "LABEL_186",
203
+ "187": "LABEL_187",
204
+ "188": "LABEL_188",
205
+ "189": "LABEL_189",
206
+ "190": "LABEL_190",
207
+ "191": "LABEL_191",
208
+ "192": "LABEL_192",
209
+ "193": "LABEL_193",
210
+ "194": "LABEL_194",
211
+ "195": "LABEL_195",
212
+ "196": "LABEL_196",
213
+ "197": "LABEL_197",
214
+ "198": "LABEL_198",
215
+ "199": "LABEL_199",
216
+ "200": "LABEL_200",
217
+ "201": "LABEL_201",
218
+ "202": "LABEL_202",
219
+ "203": "LABEL_203",
220
+ "204": "LABEL_204",
221
+ "205": "LABEL_205",
222
+ "206": "LABEL_206",
223
+ "207": "LABEL_207",
224
+ "208": "LABEL_208",
225
+ "209": "LABEL_209",
226
+ "210": "LABEL_210",
227
+ "211": "LABEL_211",
228
+ "212": "LABEL_212",
229
+ "213": "LABEL_213",
230
+ "214": "LABEL_214",
231
+ "215": "LABEL_215",
232
+ "216": "LABEL_216",
233
+ "217": "LABEL_217",
234
+ "218": "LABEL_218",
235
+ "219": "LABEL_219",
236
+ "220": "LABEL_220",
237
+ "221": "LABEL_221",
238
+ "222": "LABEL_222",
239
+ "223": "LABEL_223",
240
+ "224": "LABEL_224",
241
+ "225": "LABEL_225",
242
+ "226": "LABEL_226",
243
+ "227": "LABEL_227",
244
+ "228": "LABEL_228",
245
+ "229": "LABEL_229",
246
+ "230": "LABEL_230",
247
+ "231": "LABEL_231",
248
+ "232": "LABEL_232",
249
+ "233": "LABEL_233",
250
+ "234": "LABEL_234",
251
+ "235": "LABEL_235",
252
+ "236": "LABEL_236",
253
+ "237": "LABEL_237",
254
+ "238": "LABEL_238",
255
+ "239": "LABEL_239",
256
+ "240": "LABEL_240",
257
+ "241": "LABEL_241",
258
+ "242": "LABEL_242",
259
+ "243": "LABEL_243",
260
+ "244": "LABEL_244",
261
+ "245": "LABEL_245",
262
+ "246": "LABEL_246",
263
+ "247": "LABEL_247",
264
+ "248": "LABEL_248",
265
+ "249": "LABEL_249",
266
+ "250": "LABEL_250",
267
+ "251": "LABEL_251",
268
+ "252": "LABEL_252",
269
+ "253": "LABEL_253",
270
+ "254": "LABEL_254",
271
+ "255": "LABEL_255",
272
+ "256": "LABEL_256",
273
+ "257": "LABEL_257",
274
+ "258": "LABEL_258",
275
+ "259": "LABEL_259",
276
+ "260": "LABEL_260",
277
+ "261": "LABEL_261",
278
+ "262": "LABEL_262",
279
+ "263": "LABEL_263",
280
+ "264": "LABEL_264",
281
+ "265": "LABEL_265",
282
+ "266": "LABEL_266",
283
+ "267": "LABEL_267",
284
+ "268": "LABEL_268",
285
+ "269": "LABEL_269",
286
+ "270": "LABEL_270",
287
+ "271": "LABEL_271",
288
+ "272": "LABEL_272",
289
+ "273": "LABEL_273",
290
+ "274": "LABEL_274",
291
+ "275": "LABEL_275",
292
+ "276": "LABEL_276",
293
+ "277": "LABEL_277",
294
+ "278": "LABEL_278",
295
+ "279": "LABEL_279",
296
+ "280": "LABEL_280",
297
+ "281": "LABEL_281",
298
+ "282": "LABEL_282",
299
+ "283": "LABEL_283",
300
+ "284": "LABEL_284",
301
+ "285": "LABEL_285",
302
+ "286": "LABEL_286",
303
+ "287": "LABEL_287",
304
+ "288": "LABEL_288",
305
+ "289": "LABEL_289",
306
+ "290": "LABEL_290",
307
+ "291": "LABEL_291",
308
+ "292": "LABEL_292",
309
+ "293": "LABEL_293",
310
+ "294": "LABEL_294",
311
+ "295": "LABEL_295",
312
+ "296": "LABEL_296",
313
+ "297": "LABEL_297",
314
+ "298": "LABEL_298",
315
+ "299": "LABEL_299",
316
+ "300": "LABEL_300",
317
+ "301": "LABEL_301",
318
+ "302": "LABEL_302",
319
+ "303": "LABEL_303",
320
+ "304": "LABEL_304",
321
+ "305": "LABEL_305",
322
+ "306": "LABEL_306",
323
+ "307": "LABEL_307",
324
+ "308": "LABEL_308",
325
+ "309": "LABEL_309",
326
+ "310": "LABEL_310",
327
+ "311": "LABEL_311",
328
+ "312": "LABEL_312",
329
+ "313": "LABEL_313",
330
+ "314": "LABEL_314",
331
+ "315": "LABEL_315",
332
+ "316": "LABEL_316",
333
+ "317": "LABEL_317",
334
+ "318": "LABEL_318",
335
+ "319": "LABEL_319",
336
+ "320": "LABEL_320",
337
+ "321": "LABEL_321",
338
+ "322": "LABEL_322",
339
+ "323": "LABEL_323",
340
+ "324": "LABEL_324",
341
+ "325": "LABEL_325",
342
+ "326": "LABEL_326",
343
+ "327": "LABEL_327",
344
+ "328": "LABEL_328",
345
+ "329": "LABEL_329",
346
+ "330": "LABEL_330",
347
+ "331": "LABEL_331",
348
+ "332": "LABEL_332",
349
+ "333": "LABEL_333",
350
+ "334": "LABEL_334",
351
+ "335": "LABEL_335",
352
+ "336": "LABEL_336",
353
+ "337": "LABEL_337",
354
+ "338": "LABEL_338",
355
+ "339": "LABEL_339",
356
+ "340": "LABEL_340",
357
+ "341": "LABEL_341",
358
+ "342": "LABEL_342",
359
+ "343": "LABEL_343",
360
+ "344": "LABEL_344",
361
+ "345": "LABEL_345",
362
+ "346": "LABEL_346",
363
+ "347": "LABEL_347",
364
+ "348": "LABEL_348",
365
+ "349": "LABEL_349",
366
+ "350": "LABEL_350",
367
+ "351": "LABEL_351",
368
+ "352": "LABEL_352",
369
+ "353": "LABEL_353",
370
+ "354": "LABEL_354",
371
+ "355": "LABEL_355",
372
+ "356": "LABEL_356",
373
+ "357": "LABEL_357",
374
+ "358": "LABEL_358",
375
+ "359": "LABEL_359",
376
+ "360": "LABEL_360",
377
+ "361": "LABEL_361",
378
+ "362": "LABEL_362",
379
+ "363": "LABEL_363",
380
+ "364": "LABEL_364",
381
+ "365": "LABEL_365",
382
+ "366": "LABEL_366",
383
+ "367": "LABEL_367",
384
+ "368": "LABEL_368",
385
+ "369": "LABEL_369",
386
+ "370": "LABEL_370",
387
+ "371": "LABEL_371",
388
+ "372": "LABEL_372",
389
+ "373": "LABEL_373",
390
+ "374": "LABEL_374",
391
+ "375": "LABEL_375",
392
+ "376": "LABEL_376",
393
+ "377": "LABEL_377",
394
+ "378": "LABEL_378",
395
+ "379": "LABEL_379",
396
+ "380": "LABEL_380",
397
+ "381": "LABEL_381",
398
+ "382": "LABEL_382",
399
+ "383": "LABEL_383",
400
+ "384": "LABEL_384",
401
+ "385": "LABEL_385",
402
+ "386": "LABEL_386",
403
+ "387": "LABEL_387",
404
+ "388": "LABEL_388",
405
+ "389": "LABEL_389",
406
+ "390": "LABEL_390",
407
+ "391": "LABEL_391",
408
+ "392": "LABEL_392",
409
+ "393": "LABEL_393",
410
+ "394": "LABEL_394",
411
+ "395": "LABEL_395",
412
+ "396": "LABEL_396",
413
+ "397": "LABEL_397",
414
+ "398": "LABEL_398",
415
+ "399": "LABEL_399",
416
+ "400": "LABEL_400",
417
+ "401": "LABEL_401",
418
+ "402": "LABEL_402",
419
+ "403": "LABEL_403",
420
+ "404": "LABEL_404",
421
+ "405": "LABEL_405",
422
+ "406": "LABEL_406",
423
+ "407": "LABEL_407",
424
+ "408": "LABEL_408",
425
+ "409": "LABEL_409",
426
+ "410": "LABEL_410",
427
+ "411": "LABEL_411",
428
+ "412": "LABEL_412",
429
+ "413": "LABEL_413",
430
+ "414": "LABEL_414",
431
+ "415": "LABEL_415",
432
+ "416": "LABEL_416",
433
+ "417": "LABEL_417",
434
+ "418": "LABEL_418",
435
+ "419": "LABEL_419",
436
+ "420": "LABEL_420",
437
+ "421": "LABEL_421",
438
+ "422": "LABEL_422",
439
+ "423": "LABEL_423",
440
+ "424": "LABEL_424",
441
+ "425": "LABEL_425",
442
+ "426": "LABEL_426",
443
+ "427": "LABEL_427",
444
+ "428": "LABEL_428",
445
+ "429": "LABEL_429",
446
+ "430": "LABEL_430",
447
+ "431": "LABEL_431",
448
+ "432": "LABEL_432",
449
+ "433": "LABEL_433",
450
+ "434": "LABEL_434",
451
+ "435": "LABEL_435",
452
+ "436": "LABEL_436",
453
+ "437": "LABEL_437",
454
+ "438": "LABEL_438",
455
+ "439": "LABEL_439",
456
+ "440": "LABEL_440",
457
+ "441": "LABEL_441",
458
+ "442": "LABEL_442",
459
+ "443": "LABEL_443",
460
+ "444": "LABEL_444",
461
+ "445": "LABEL_445",
462
+ "446": "LABEL_446",
463
+ "447": "LABEL_447",
464
+ "448": "LABEL_448",
465
+ "449": "LABEL_449",
466
+ "450": "LABEL_450",
467
+ "451": "LABEL_451",
468
+ "452": "LABEL_452",
469
+ "453": "LABEL_453",
470
+ "454": "LABEL_454",
471
+ "455": "LABEL_455",
472
+ "456": "LABEL_456",
473
+ "457": "LABEL_457",
474
+ "458": "LABEL_458",
475
+ "459": "LABEL_459",
476
+ "460": "LABEL_460",
477
+ "461": "LABEL_461",
478
+ "462": "LABEL_462",
479
+ "463": "LABEL_463",
480
+ "464": "LABEL_464",
481
+ "465": "LABEL_465",
482
+ "466": "LABEL_466",
483
+ "467": "LABEL_467",
484
+ "468": "LABEL_468",
485
+ "469": "LABEL_469",
486
+ "470": "LABEL_470",
487
+ "471": "LABEL_471",
488
+ "472": "LABEL_472",
489
+ "473": "LABEL_473",
490
+ "474": "LABEL_474",
491
+ "475": "LABEL_475",
492
+ "476": "LABEL_476",
493
+ "477": "LABEL_477",
494
+ "478": "LABEL_478",
495
+ "479": "LABEL_479",
496
+ "480": "LABEL_480",
497
+ "481": "LABEL_481",
498
+ "482": "LABEL_482",
499
+ "483": "LABEL_483",
500
+ "484": "LABEL_484",
501
+ "485": "LABEL_485",
502
+ "486": "LABEL_486",
503
+ "487": "LABEL_487",
504
+ "488": "LABEL_488",
505
+ "489": "LABEL_489",
506
+ "490": "LABEL_490",
507
+ "491": "LABEL_491",
508
+ "492": "LABEL_492",
509
+ "493": "LABEL_493",
510
+ "494": "LABEL_494",
511
+ "495": "LABEL_495",
512
+ "496": "LABEL_496",
513
+ "497": "LABEL_497",
514
+ "498": "LABEL_498",
515
+ "499": "LABEL_499",
516
+ "500": "LABEL_500",
517
+ "501": "LABEL_501",
518
+ "502": "LABEL_502",
519
+ "503": "LABEL_503",
520
+ "504": "LABEL_504",
521
+ "505": "LABEL_505",
522
+ "506": "LABEL_506",
523
+ "507": "LABEL_507",
524
+ "508": "LABEL_508",
525
+ "509": "LABEL_509"
526
+ },
527
+ "initializer_range": 0.02,
528
+ "input_size": 224,
529
+ "intermediate_size": 4096,
530
+ "label2id": {
531
+ "LABEL_0": 0,
532
+ "LABEL_1": 1,
533
+ "LABEL_10": 10,
534
+ "LABEL_100": 100,
535
+ "LABEL_101": 101,
536
+ "LABEL_102": 102,
537
+ "LABEL_103": 103,
538
+ "LABEL_104": 104,
539
+ "LABEL_105": 105,
540
+ "LABEL_106": 106,
541
+ "LABEL_107": 107,
542
+ "LABEL_108": 108,
543
+ "LABEL_109": 109,
544
+ "LABEL_11": 11,
545
+ "LABEL_110": 110,
546
+ "LABEL_111": 111,
547
+ "LABEL_112": 112,
548
+ "LABEL_113": 113,
549
+ "LABEL_114": 114,
550
+ "LABEL_115": 115,
551
+ "LABEL_116": 116,
552
+ "LABEL_117": 117,
553
+ "LABEL_118": 118,
554
+ "LABEL_119": 119,
555
+ "LABEL_12": 12,
556
+ "LABEL_120": 120,
557
+ "LABEL_121": 121,
558
+ "LABEL_122": 122,
559
+ "LABEL_123": 123,
560
+ "LABEL_124": 124,
561
+ "LABEL_125": 125,
562
+ "LABEL_126": 126,
563
+ "LABEL_127": 127,
564
+ "LABEL_128": 128,
565
+ "LABEL_129": 129,
566
+ "LABEL_13": 13,
567
+ "LABEL_130": 130,
568
+ "LABEL_131": 131,
569
+ "LABEL_132": 132,
570
+ "LABEL_133": 133,
571
+ "LABEL_134": 134,
572
+ "LABEL_135": 135,
573
+ "LABEL_136": 136,
574
+ "LABEL_137": 137,
575
+ "LABEL_138": 138,
576
+ "LABEL_139": 139,
577
+ "LABEL_14": 14,
578
+ "LABEL_140": 140,
579
+ "LABEL_141": 141,
580
+ "LABEL_142": 142,
581
+ "LABEL_143": 143,
582
+ "LABEL_144": 144,
583
+ "LABEL_145": 145,
584
+ "LABEL_146": 146,
585
+ "LABEL_147": 147,
586
+ "LABEL_148": 148,
587
+ "LABEL_149": 149,
588
+ "LABEL_15": 15,
589
+ "LABEL_150": 150,
590
+ "LABEL_151": 151,
591
+ "LABEL_152": 152,
592
+ "LABEL_153": 153,
593
+ "LABEL_154": 154,
594
+ "LABEL_155": 155,
595
+ "LABEL_156": 156,
596
+ "LABEL_157": 157,
597
+ "LABEL_158": 158,
598
+ "LABEL_159": 159,
599
+ "LABEL_16": 16,
600
+ "LABEL_160": 160,
601
+ "LABEL_161": 161,
602
+ "LABEL_162": 162,
603
+ "LABEL_163": 163,
604
+ "LABEL_164": 164,
605
+ "LABEL_165": 165,
606
+ "LABEL_166": 166,
607
+ "LABEL_167": 167,
608
+ "LABEL_168": 168,
609
+ "LABEL_169": 169,
610
+ "LABEL_17": 17,
611
+ "LABEL_170": 170,
612
+ "LABEL_171": 171,
613
+ "LABEL_172": 172,
614
+ "LABEL_173": 173,
615
+ "LABEL_174": 174,
616
+ "LABEL_175": 175,
617
+ "LABEL_176": 176,
618
+ "LABEL_177": 177,
619
+ "LABEL_178": 178,
620
+ "LABEL_179": 179,
621
+ "LABEL_18": 18,
622
+ "LABEL_180": 180,
623
+ "LABEL_181": 181,
624
+ "LABEL_182": 182,
625
+ "LABEL_183": 183,
626
+ "LABEL_184": 184,
627
+ "LABEL_185": 185,
628
+ "LABEL_186": 186,
629
+ "LABEL_187": 187,
630
+ "LABEL_188": 188,
631
+ "LABEL_189": 189,
632
+ "LABEL_19": 19,
633
+ "LABEL_190": 190,
634
+ "LABEL_191": 191,
635
+ "LABEL_192": 192,
636
+ "LABEL_193": 193,
637
+ "LABEL_194": 194,
638
+ "LABEL_195": 195,
639
+ "LABEL_196": 196,
640
+ "LABEL_197": 197,
641
+ "LABEL_198": 198,
642
+ "LABEL_199": 199,
643
+ "LABEL_2": 2,
644
+ "LABEL_20": 20,
645
+ "LABEL_200": 200,
646
+ "LABEL_201": 201,
647
+ "LABEL_202": 202,
648
+ "LABEL_203": 203,
649
+ "LABEL_204": 204,
650
+ "LABEL_205": 205,
651
+ "LABEL_206": 206,
652
+ "LABEL_207": 207,
653
+ "LABEL_208": 208,
654
+ "LABEL_209": 209,
655
+ "LABEL_21": 21,
656
+ "LABEL_210": 210,
657
+ "LABEL_211": 211,
658
+ "LABEL_212": 212,
659
+ "LABEL_213": 213,
660
+ "LABEL_214": 214,
661
+ "LABEL_215": 215,
662
+ "LABEL_216": 216,
663
+ "LABEL_217": 217,
664
+ "LABEL_218": 218,
665
+ "LABEL_219": 219,
666
+ "LABEL_22": 22,
667
+ "LABEL_220": 220,
668
+ "LABEL_221": 221,
669
+ "LABEL_222": 222,
670
+ "LABEL_223": 223,
671
+ "LABEL_224": 224,
672
+ "LABEL_225": 225,
673
+ "LABEL_226": 226,
674
+ "LABEL_227": 227,
675
+ "LABEL_228": 228,
676
+ "LABEL_229": 229,
677
+ "LABEL_23": 23,
678
+ "LABEL_230": 230,
679
+ "LABEL_231": 231,
680
+ "LABEL_232": 232,
681
+ "LABEL_233": 233,
682
+ "LABEL_234": 234,
683
+ "LABEL_235": 235,
684
+ "LABEL_236": 236,
685
+ "LABEL_237": 237,
686
+ "LABEL_238": 238,
687
+ "LABEL_239": 239,
688
+ "LABEL_24": 24,
689
+ "LABEL_240": 240,
690
+ "LABEL_241": 241,
691
+ "LABEL_242": 242,
692
+ "LABEL_243": 243,
693
+ "LABEL_244": 244,
694
+ "LABEL_245": 245,
695
+ "LABEL_246": 246,
696
+ "LABEL_247": 247,
697
+ "LABEL_248": 248,
698
+ "LABEL_249": 249,
699
+ "LABEL_25": 25,
700
+ "LABEL_250": 250,
701
+ "LABEL_251": 251,
702
+ "LABEL_252": 252,
703
+ "LABEL_253": 253,
704
+ "LABEL_254": 254,
705
+ "LABEL_255": 255,
706
+ "LABEL_256": 256,
707
+ "LABEL_257": 257,
708
+ "LABEL_258": 258,
709
+ "LABEL_259": 259,
710
+ "LABEL_26": 26,
711
+ "LABEL_260": 260,
712
+ "LABEL_261": 261,
713
+ "LABEL_262": 262,
714
+ "LABEL_263": 263,
715
+ "LABEL_264": 264,
716
+ "LABEL_265": 265,
717
+ "LABEL_266": 266,
718
+ "LABEL_267": 267,
719
+ "LABEL_268": 268,
720
+ "LABEL_269": 269,
721
+ "LABEL_27": 27,
722
+ "LABEL_270": 270,
723
+ "LABEL_271": 271,
724
+ "LABEL_272": 272,
725
+ "LABEL_273": 273,
726
+ "LABEL_274": 274,
727
+ "LABEL_275": 275,
728
+ "LABEL_276": 276,
729
+ "LABEL_277": 277,
730
+ "LABEL_278": 278,
731
+ "LABEL_279": 279,
732
+ "LABEL_28": 28,
733
+ "LABEL_280": 280,
734
+ "LABEL_281": 281,
735
+ "LABEL_282": 282,
736
+ "LABEL_283": 283,
737
+ "LABEL_284": 284,
738
+ "LABEL_285": 285,
739
+ "LABEL_286": 286,
740
+ "LABEL_287": 287,
741
+ "LABEL_288": 288,
742
+ "LABEL_289": 289,
743
+ "LABEL_29": 29,
744
+ "LABEL_290": 290,
745
+ "LABEL_291": 291,
746
+ "LABEL_292": 292,
747
+ "LABEL_293": 293,
748
+ "LABEL_294": 294,
749
+ "LABEL_295": 295,
750
+ "LABEL_296": 296,
751
+ "LABEL_297": 297,
752
+ "LABEL_298": 298,
753
+ "LABEL_299": 299,
754
+ "LABEL_3": 3,
755
+ "LABEL_30": 30,
756
+ "LABEL_300": 300,
757
+ "LABEL_301": 301,
758
+ "LABEL_302": 302,
759
+ "LABEL_303": 303,
760
+ "LABEL_304": 304,
761
+ "LABEL_305": 305,
762
+ "LABEL_306": 306,
763
+ "LABEL_307": 307,
764
+ "LABEL_308": 308,
765
+ "LABEL_309": 309,
766
+ "LABEL_31": 31,
767
+ "LABEL_310": 310,
768
+ "LABEL_311": 311,
769
+ "LABEL_312": 312,
770
+ "LABEL_313": 313,
771
+ "LABEL_314": 314,
772
+ "LABEL_315": 315,
773
+ "LABEL_316": 316,
774
+ "LABEL_317": 317,
775
+ "LABEL_318": 318,
776
+ "LABEL_319": 319,
777
+ "LABEL_32": 32,
778
+ "LABEL_320": 320,
779
+ "LABEL_321": 321,
780
+ "LABEL_322": 322,
781
+ "LABEL_323": 323,
782
+ "LABEL_324": 324,
783
+ "LABEL_325": 325,
784
+ "LABEL_326": 326,
785
+ "LABEL_327": 327,
786
+ "LABEL_328": 328,
787
+ "LABEL_329": 329,
788
+ "LABEL_33": 33,
789
+ "LABEL_330": 330,
790
+ "LABEL_331": 331,
791
+ "LABEL_332": 332,
792
+ "LABEL_333": 333,
793
+ "LABEL_334": 334,
794
+ "LABEL_335": 335,
795
+ "LABEL_336": 336,
796
+ "LABEL_337": 337,
797
+ "LABEL_338": 338,
798
+ "LABEL_339": 339,
799
+ "LABEL_34": 34,
800
+ "LABEL_340": 340,
801
+ "LABEL_341": 341,
802
+ "LABEL_342": 342,
803
+ "LABEL_343": 343,
804
+ "LABEL_344": 344,
805
+ "LABEL_345": 345,
806
+ "LABEL_346": 346,
807
+ "LABEL_347": 347,
808
+ "LABEL_348": 348,
809
+ "LABEL_349": 349,
810
+ "LABEL_35": 35,
811
+ "LABEL_350": 350,
812
+ "LABEL_351": 351,
813
+ "LABEL_352": 352,
814
+ "LABEL_353": 353,
815
+ "LABEL_354": 354,
816
+ "LABEL_355": 355,
817
+ "LABEL_356": 356,
818
+ "LABEL_357": 357,
819
+ "LABEL_358": 358,
820
+ "LABEL_359": 359,
821
+ "LABEL_36": 36,
822
+ "LABEL_360": 360,
823
+ "LABEL_361": 361,
824
+ "LABEL_362": 362,
825
+ "LABEL_363": 363,
826
+ "LABEL_364": 364,
827
+ "LABEL_365": 365,
828
+ "LABEL_366": 366,
829
+ "LABEL_367": 367,
830
+ "LABEL_368": 368,
831
+ "LABEL_369": 369,
832
+ "LABEL_37": 37,
833
+ "LABEL_370": 370,
834
+ "LABEL_371": 371,
835
+ "LABEL_372": 372,
836
+ "LABEL_373": 373,
837
+ "LABEL_374": 374,
838
+ "LABEL_375": 375,
839
+ "LABEL_376": 376,
840
+ "LABEL_377": 377,
841
+ "LABEL_378": 378,
842
+ "LABEL_379": 379,
843
+ "LABEL_38": 38,
844
+ "LABEL_380": 380,
845
+ "LABEL_381": 381,
846
+ "LABEL_382": 382,
847
+ "LABEL_383": 383,
848
+ "LABEL_384": 384,
849
+ "LABEL_385": 385,
850
+ "LABEL_386": 386,
851
+ "LABEL_387": 387,
852
+ "LABEL_388": 388,
853
+ "LABEL_389": 389,
854
+ "LABEL_39": 39,
855
+ "LABEL_390": 390,
856
+ "LABEL_391": 391,
857
+ "LABEL_392": 392,
858
+ "LABEL_393": 393,
859
+ "LABEL_394": 394,
860
+ "LABEL_395": 395,
861
+ "LABEL_396": 396,
862
+ "LABEL_397": 397,
863
+ "LABEL_398": 398,
864
+ "LABEL_399": 399,
865
+ "LABEL_4": 4,
866
+ "LABEL_40": 40,
867
+ "LABEL_400": 400,
868
+ "LABEL_401": 401,
869
+ "LABEL_402": 402,
870
+ "LABEL_403": 403,
871
+ "LABEL_404": 404,
872
+ "LABEL_405": 405,
873
+ "LABEL_406": 406,
874
+ "LABEL_407": 407,
875
+ "LABEL_408": 408,
876
+ "LABEL_409": 409,
877
+ "LABEL_41": 41,
878
+ "LABEL_410": 410,
879
+ "LABEL_411": 411,
880
+ "LABEL_412": 412,
881
+ "LABEL_413": 413,
882
+ "LABEL_414": 414,
883
+ "LABEL_415": 415,
884
+ "LABEL_416": 416,
885
+ "LABEL_417": 417,
886
+ "LABEL_418": 418,
887
+ "LABEL_419": 419,
888
+ "LABEL_42": 42,
889
+ "LABEL_420": 420,
890
+ "LABEL_421": 421,
891
+ "LABEL_422": 422,
892
+ "LABEL_423": 423,
893
+ "LABEL_424": 424,
894
+ "LABEL_425": 425,
895
+ "LABEL_426": 426,
896
+ "LABEL_427": 427,
897
+ "LABEL_428": 428,
898
+ "LABEL_429": 429,
899
+ "LABEL_43": 43,
900
+ "LABEL_430": 430,
901
+ "LABEL_431": 431,
902
+ "LABEL_432": 432,
903
+ "LABEL_433": 433,
904
+ "LABEL_434": 434,
905
+ "LABEL_435": 435,
906
+ "LABEL_436": 436,
907
+ "LABEL_437": 437,
908
+ "LABEL_438": 438,
909
+ "LABEL_439": 439,
910
+ "LABEL_44": 44,
911
+ "LABEL_440": 440,
912
+ "LABEL_441": 441,
913
+ "LABEL_442": 442,
914
+ "LABEL_443": 443,
915
+ "LABEL_444": 444,
916
+ "LABEL_445": 445,
917
+ "LABEL_446": 446,
918
+ "LABEL_447": 447,
919
+ "LABEL_448": 448,
920
+ "LABEL_449": 449,
921
+ "LABEL_45": 45,
922
+ "LABEL_450": 450,
923
+ "LABEL_451": 451,
924
+ "LABEL_452": 452,
925
+ "LABEL_453": 453,
926
+ "LABEL_454": 454,
927
+ "LABEL_455": 455,
928
+ "LABEL_456": 456,
929
+ "LABEL_457": 457,
930
+ "LABEL_458": 458,
931
+ "LABEL_459": 459,
932
+ "LABEL_46": 46,
933
+ "LABEL_460": 460,
934
+ "LABEL_461": 461,
935
+ "LABEL_462": 462,
936
+ "LABEL_463": 463,
937
+ "LABEL_464": 464,
938
+ "LABEL_465": 465,
939
+ "LABEL_466": 466,
940
+ "LABEL_467": 467,
941
+ "LABEL_468": 468,
942
+ "LABEL_469": 469,
943
+ "LABEL_47": 47,
944
+ "LABEL_470": 470,
945
+ "LABEL_471": 471,
946
+ "LABEL_472": 472,
947
+ "LABEL_473": 473,
948
+ "LABEL_474": 474,
949
+ "LABEL_475": 475,
950
+ "LABEL_476": 476,
951
+ "LABEL_477": 477,
952
+ "LABEL_478": 478,
953
+ "LABEL_479": 479,
954
+ "LABEL_48": 48,
955
+ "LABEL_480": 480,
956
+ "LABEL_481": 481,
957
+ "LABEL_482": 482,
958
+ "LABEL_483": 483,
959
+ "LABEL_484": 484,
960
+ "LABEL_485": 485,
961
+ "LABEL_486": 486,
962
+ "LABEL_487": 487,
963
+ "LABEL_488": 488,
964
+ "LABEL_489": 489,
965
+ "LABEL_49": 49,
966
+ "LABEL_490": 490,
967
+ "LABEL_491": 491,
968
+ "LABEL_492": 492,
969
+ "LABEL_493": 493,
970
+ "LABEL_494": 494,
971
+ "LABEL_495": 495,
972
+ "LABEL_496": 496,
973
+ "LABEL_497": 497,
974
+ "LABEL_498": 498,
975
+ "LABEL_499": 499,
976
+ "LABEL_5": 5,
977
+ "LABEL_50": 50,
978
+ "LABEL_500": 500,
979
+ "LABEL_501": 501,
980
+ "LABEL_502": 502,
981
+ "LABEL_503": 503,
982
+ "LABEL_504": 504,
983
+ "LABEL_505": 505,
984
+ "LABEL_506": 506,
985
+ "LABEL_507": 507,
986
+ "LABEL_508": 508,
987
+ "LABEL_509": 509,
988
+ "LABEL_51": 51,
989
+ "LABEL_52": 52,
990
+ "LABEL_53": 53,
991
+ "LABEL_54": 54,
992
+ "LABEL_55": 55,
993
+ "LABEL_56": 56,
994
+ "LABEL_57": 57,
995
+ "LABEL_58": 58,
996
+ "LABEL_59": 59,
997
+ "LABEL_6": 6,
998
+ "LABEL_60": 60,
999
+ "LABEL_61": 61,
1000
+ "LABEL_62": 62,
1001
+ "LABEL_63": 63,
1002
+ "LABEL_64": 64,
1003
+ "LABEL_65": 65,
1004
+ "LABEL_66": 66,
1005
+ "LABEL_67": 67,
1006
+ "LABEL_68": 68,
1007
+ "LABEL_69": 69,
1008
+ "LABEL_7": 7,
1009
+ "LABEL_70": 70,
1010
+ "LABEL_71": 71,
1011
+ "LABEL_72": 72,
1012
+ "LABEL_73": 73,
1013
+ "LABEL_74": 74,
1014
+ "LABEL_75": 75,
1015
+ "LABEL_76": 76,
1016
+ "LABEL_77": 77,
1017
+ "LABEL_78": 78,
1018
+ "LABEL_79": 79,
1019
+ "LABEL_8": 8,
1020
+ "LABEL_80": 80,
1021
+ "LABEL_81": 81,
1022
+ "LABEL_82": 82,
1023
+ "LABEL_83": 83,
1024
+ "LABEL_84": 84,
1025
+ "LABEL_85": 85,
1026
+ "LABEL_86": 86,
1027
+ "LABEL_87": 87,
1028
+ "LABEL_88": 88,
1029
+ "LABEL_89": 89,
1030
+ "LABEL_9": 9,
1031
+ "LABEL_90": 90,
1032
+ "LABEL_91": 91,
1033
+ "LABEL_92": 92,
1034
+ "LABEL_93": 93,
1035
+ "LABEL_94": 94,
1036
+ "LABEL_95": 95,
1037
+ "LABEL_96": 96,
1038
+ "LABEL_97": 97,
1039
+ "LABEL_98": 98,
1040
+ "LABEL_99": 99
1041
+ },
1042
+ "layer_norm_eps": 1e-05,
1043
+ "max_2d_position_embeddings": 1024,
1044
+ "max_position_embeddings": 514,
1045
+ "max_rel_2d_pos": 256,
1046
+ "max_rel_pos": 128,
1047
+ "model_type": "layoutlmv3",
1048
+ "num_attention_heads": 16,
1049
+ "num_channels": 3,
1050
+ "num_hidden_layers": 24,
1051
+ "pad_token_id": 1,
1052
+ "patch_size": 16,
1053
+ "rel_2d_pos_bins": 64,
1054
+ "rel_pos_bins": 32,
1055
+ "second_input_size": 112,
1056
+ "shape_size": 170,
1057
+ "text_embed": true,
1058
+ "torch_dtype": "bfloat16",
1059
+ "transformers_version": "4.50.0",
1060
+ "type_vocab_size": 1,
1061
+ "visual_embed": false,
1062
+ "vocab_size": 50265
1063
+ }
Relation/model.safetensors ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:78a7ca1cb2ba8162b2672641f9d94ebde8b953fdf35c9417c0c8383e82751265
3
+ size 713217212
Structure/doclayout_yolo_docstructbench_imgsz1280_2501.pt ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:1b152460888dc30be6db7f5dfab28bde3dcc999e5202f46187a764a1699c80be
3
+ size 39772550
Structure/layout_zh.pt ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:f5acc32e5087ebb2601cf1221c7bdba960c086e1e4b009b15ce8b21c8e935fe3
3
+ size 40654210