paddleocr-onnx / README.md

Update README.md

7b02d0a verified 2 months ago

12.3 kB

	---
	license: apache-2.0
	language:
	- en
	- fr
	- de
	- es
	- it
	- pt
	- nl
	- pl
	- cs
	- sk
	- hr
	- bs
	- sr
	- sl
	- da
	- "no"
	- sv
	- is
	- et
	- lt
	- hu
	- sq
	- cy
	- ga
	- tr
	- id
	- ms
	- af
	- sw
	- tl
	- uz
	- la
	- ru
	- bg
	- uk
	- be
	- ko
	- zh
	- ja
	- th
	- el
	- hi
	- mr
	- ne
	- sa
	- ar
	- ur
	- fa
	- ta
	- te
	tags:
	- ocr
	- optical-character-recognition
	- text-detection
	- text-recognition
	- paddleocr
	- onnx
	- computer-vision
	- document-ai
	library_name: onnx
	pipeline_tag: image-to-text
	---

	# PP-OCR ONNX Models

	Multilingual OCR models from PaddleOCR, converted to ONNX format for production deployment.

	Use as a complete pipeline: Integrate with [monkt.com](https://monkt.com) for end-to-end document processing.

	Source: [PaddlePaddle PP-OCRv5 Collection](https://huggingface.co/collections/PaddlePaddle/pp-ocrv5-684a5356aef5b4b1d7b85e4b)
	Format: ONNX (optimized for inference)
	License: Apache 2.0

	---

	## Overview

	16 models covering 48+ languages:
	- 11 PP-OCRv5 models (latest, highest accuracy)
	- 5 PP-OCRv3 models (legacy, additional language support)

	---

	## Quick Start

	### Download from HuggingFace

	```bash
	pip install huggingface_hub rapidocr-onnxruntime
	```

	<details>
	<summary><b>Download specific language models</b></summary>

	```python
	from huggingface_hub import hf_hub_download

	# Download English models
	det_path = hf_hub_download("monkt/paddleocr-onnx", "detection/v5/det.onnx")
	rec_path = hf_hub_download("monkt/paddleocr-onnx", "languages/english/rec.onnx")
	dict_path = hf_hub_download("monkt/paddleocr-onnx", "languages/english/dict.txt")

	# Use with RapidOCR
	from rapidocr_onnxruntime import RapidOCR
	ocr = RapidOCR(det_model_path=det_path, rec_model_path=rec_path, rec_keys_path=dict_path)
	result, elapsed = ocr("document.jpg")
	```

	</details>

	<details>
	<summary><b>Download entire language folder</b></summary>

	```python
	from huggingface_hub import snapshot_download

	# Download all French/German/Spanish (Latin) models
	snapshot_download("monkt/paddleocr-onnx", allow_patterns=["detection/v5/", "languages/latin/"])

	# Download Arabic models (v3)
	snapshot_download("monkt/paddleocr-onnx", allow_patterns=["detection/v3/", "languages/arabic/"])
	```

	</details>

	<details>
	<summary><b>Clone entire repository</b></summary>

	```bash
	git clone https://huggingface.co/monkt/paddleocr-onnx
	cd paddleocr-onnx
	```

	</details>

	### Basic Usage

	```python
	from rapidocr_onnxruntime import RapidOCR

	ocr = RapidOCR(
	det_model_path="detection/v5/det.onnx",
	rec_model_path="languages/english/rec.onnx",
	rec_keys_path="languages/english/dict.txt"
	)

	result, elapsed = ocr("document.jpg")
	for line in result:
	print(line[1][0]) # Extracted text
	```

	---

	## Available Models

	### PP-OCRv5 Recognition Models

	\| Language Group \| Path \| Languages \| Accuracy \| Size \|
	\|----------------\|------\|-----------\|----------\|------\|
	\| English \| `languages/english/` \| English \| 85.25% \| 7.5 MB \|
	\| Latin \| `languages/latin/` \| French, German, Spanish, Italian, Portuguese, + 27 more \| 84.7% \| 7.5 MB \|
	\| East Slavic \| `languages/eslav/` \| Russian, Bulgarian, Ukrainian, Belarusian \| 81.6% \| 7.5 MB \|
	\| Korean \| `languages/korean/` \| Korean \| 88.0% \| 13 MB \|
	\| Chinese/Japanese \| `languages/chinese/` \| Chinese, Japanese \| - \| 81 MB \|
	\| Thai \| `languages/thai/` \| Thai \| 82.68% \| 7.5 MB \|
	\| Greek \| `languages/greek/` \| Greek \| 89.28% \| 7.4 MB \|

	### PP-OCRv3 Recognition Models (Legacy)

	\| Language Group \| Path \| Languages \| Version \| Size \|
	\|----------------\|------\|-----------\|---------\|------\|
	\| Devanagari \| `languages/hindi/` \| Hindi, Marathi, Nepali, Sanskrit \| v3 \| 8.6 MB \|
	\| Arabic \| `languages/arabic/` \| Arabic, Urdu, Persian/Farsi \| v3 \| 8.6 MB \|
	\| Tamil \| `languages/tamil/` \| Tamil \| v3 \| 8.6 MB \|
	\| Telugu \| `languages/telugu/` \| Telugu \| v3 \| 8.6 MB \|

	### Detection Models

	\| Model \| Path \| Version \| Size \|
	\|-------\|------\|---------\|------\|
	\| PP-OCRv5 Detection \| `detection/v5/det.onnx` \| v5 \| 84 MB \|
	\| PP-OCRv3 Detection \| `detection/v3/det.onnx` \| v3 \| 2.3 MB \|

	Note: Use v5 detection with v5 recognition models. Use v3 detection with v3 recognition models.

	### Preprocessing Models (Optional)

	\| Model \| Path \| Purpose \| Accuracy \| Size \|
	\|-------\|------\|---------\|----------\|------\|
	\| Document Orientation \| `preprocessing/doc-orientation/` \| Corrects rotated documents (0°, 90°, 180°, 270°) \| 99.06% \| 6.5 MB \|
	\| Text Line Orientation \| `preprocessing/textline-orientation/` \| Corrects upside-down text (0°, 180°) \| 98.85% \| 6.5 MB \|
	\| Document Unwarping \| `preprocessing/doc-unwarping/` \| Fixes curved/warped documents \| - \| 30 MB \|

	---

	## Language Support

	### PP-OCRv5 Languages (40+)

	Latin Script (32 languages): English, French, German, Spanish, Italian, Portuguese, Dutch, Polish, Czech, Slovak, Croatian, Bosnian, Serbian, Slovenian, Danish, Norwegian, Swedish, Icelandic, Estonian, Lithuanian, Hungarian, Albanian, Welsh, Irish, Turkish, Indonesian, Malay, Afrikaans, Swahili, Tagalog, Uzbek, Latin

	Cyrillic: Russian, Bulgarian, Ukrainian, Belarusian

	East Asian: Chinese (Simplified, Traditional), Japanese (Hiragana, Katakana, Kanji), Korean

	Southeast Asian: Thai

	Other: Greek

	### PP-OCRv3 Languages (8)

	South Asian: Hindi, Marathi, Nepali, Sanskrit, Tamil, Telugu

	Middle Eastern: Arabic, Urdu, Persian/Farsi

	---

	## Usage Examples

	<details>
	<summary><b>PP-OCRv5 Models (English, Latin, East Asian, etc.)</b></summary>

	```python
	from rapidocr_onnxruntime import RapidOCR

	# English
	ocr = RapidOCR(
	det_model_path="detection/v5/det.onnx",
	rec_model_path="languages/english/rec.onnx",
	rec_keys_path="languages/english/dict.txt"
	)

	# French, German, Spanish, etc. (32 languages)
	ocr = RapidOCR(
	det_model_path="detection/v5/det.onnx",
	rec_model_path="languages/latin/rec.onnx",
	rec_keys_path="languages/latin/dict.txt"
	)

	# Russian, Bulgarian, Ukrainian, Belarusian
	ocr = RapidOCR(
	det_model_path="detection/v5/det.onnx",
	rec_model_path="languages/eslav/rec.onnx",
	rec_keys_path="languages/eslav/dict.txt"
	)

	# Korean
	ocr = RapidOCR(
	det_model_path="detection/v5/det.onnx",
	rec_model_path="languages/korean/rec.onnx",
	rec_keys_path="languages/korean/dict.txt"
	)

	# Chinese/Japanese
	ocr = RapidOCR(
	det_model_path="detection/v5/det.onnx",
	rec_model_path="languages/chinese/rec.onnx",
	rec_keys_path="languages/chinese/dict.txt"
	)

	# Thai
	ocr = RapidOCR(
	det_model_path="detection/v5/det.onnx",
	rec_model_path="languages/thai/rec.onnx",
	rec_keys_path="languages/thai/dict.txt"
	)

	# Greek
	ocr = RapidOCR(
	det_model_path="detection/v5/det.onnx",
	rec_model_path="languages/greek/rec.onnx",
	rec_keys_path="languages/greek/dict.txt"
	)
	```

	</details>

	<details>
	<summary><b>PP-OCRv3 Models (Hindi, Arabic, Tamil, Telugu)</b></summary>

	```python
	from rapidocr_onnxruntime import RapidOCR

	# Hindi, Marathi, Nepali, Sanskrit
	ocr = RapidOCR(
	det_model_path="detection/v3/det.onnx",
	rec_model_path="languages/hindi/rec.onnx",
	rec_keys_path="languages/hindi/dict.txt"
	)

	# Arabic, Urdu, Persian/Farsi
	ocr = RapidOCR(
	det_model_path="detection/v3/det.onnx",
	rec_model_path="languages/arabic/rec.onnx",
	rec_keys_path="languages/arabic/dict.txt"
	)

	# Tamil
	ocr = RapidOCR(
	det_model_path="detection/v3/det.onnx",
	rec_model_path="languages/tamil/rec.onnx",
	rec_keys_path="languages/tamil/dict.txt"
	)

	# Telugu
	ocr = RapidOCR(
	det_model_path="detection/v3/det.onnx",
	rec_model_path="languages/telugu/rec.onnx",
	rec_keys_path="languages/telugu/dict.txt"
	)
	```

	</details>

	---

	## Full Pipeline with Preprocessing

	<details>
	<summary><b>Optional preprocessing for rotated/distorted documents</b></summary>

	Preprocessing models improve accuracy on rotated or distorted documents:

	```python
	from rapidocr_onnxruntime import RapidOCR

	# Complete pipeline with preprocessing
	ocr = RapidOCR(
	det_model_path="detection/v5/det.onnx",
	rec_model_path="languages/english/rec.onnx",
	rec_keys_path="languages/english/dict.txt",
	# Optional preprocessing
	use_angle_cls=True,
	angle_cls_model_path="preprocessing/textline-orientation/PP-LCNet_x1_0_textline_ori.onnx"
	)

	result, elapsed = ocr("rotated_document.jpg")
	```

	When to use preprocessing:
	- Document Orientation (`doc-orientation/`): Scanned documents with unknown rotation (0°/90°/180°/270°)
	- Text Line Orientation (`textline-orientation/`): Upside-down text lines (0°/180°)
	- Document Unwarping (`doc-unwarping/`): Curved pages, warped documents, camera photos

	Performance impact: +10-30% accuracy on distorted images, minimal speed overhead.

	</details>

	---

	## Repository Structure

	```
	.
	├── detection/
	│ ├── v5/
	│ │ ├── det.onnx # 84 MB - PP-OCRv5 detection
	│ │ └── config.json
	│ └── v3/
	│ ├── det.onnx # 2.3 MB - PP-OCRv3 detection
	│ └── config.json
	│
	├── languages/
	│ ├── english/
	│ │ ├── rec.onnx # 7.5 MB
	│ │ ├── dict.txt
	│ │ └── config.json
	│ ├── latin/ # 32 languages
	│ ├── eslav/ # Russian, Bulgarian, Ukrainian, Belarusian
	│ ├── korean/
	│ ├── chinese/ # Chinese, Japanese
	│ ├── thai/
	│ ├── greek/
	│ ├── hindi/ # Hindi, Marathi, Nepali, Sanskrit (v3)
	│ ├── arabic/ # Arabic, Urdu, Persian (v3)
	│ ├── tamil/ # Tamil (v3)
	│ └── telugu/ # Telugu (v3)
	│
	└── preprocessing/
	├── doc-orientation/
	├── textline-orientation/
	└── doc-unwarping/
	```

	---

	## Model Selection

	\| Document Language \| Model Path \|
	\|-------------------\|------------\|
	\| English \| `languages/english/` \|
	\| French, German, Spanish, Italian, Portuguese \| `languages/latin/` \|
	\| Russian, Bulgarian, Ukrainian, Belarusian \| `languages/eslav/` \|
	\| Korean \| `languages/korean/` \|
	\| Chinese, Japanese \| `languages/chinese/` \|
	\| Thai \| `languages/thai/` \|
	\| Greek \| `languages/greek/` \|
	\| Hindi, Marathi, Nepali, Sanskrit \| `languages/hindi/` + `detection/v3/` \|
	\| Arabic, Urdu, Persian/Farsi \| `languages/arabic/` + `detection/v3/` \|
	\| Tamil \| `languages/tamil/` + `detection/v3/` \|
	\| Telugu \| `languages/telugu/` + `detection/v3/` \|

	---

	## Technical Specifications

	- Framework: PaddleOCR → ONNX
	- ONNX Opset: 11
	- Precision: FP32
	- Input Format: RGB images (dynamic size)
	- Inference: CPU/GPU via onnxruntime

	### Detection Model
	- Input: `(batch, 3, height, width)` - dynamic
	- Output: Text bounding boxes

	### Recognition Model
	- Input: `(batch, 3, 32, width)` - height fixed at 32px
	- Output: CTC logits → decoded with dictionary

	---

	## Performance

	### Accuracy (PP-OCRv5)

	\| Model \| Accuracy \| Dataset \|
	\|-------\|----------\|---------\|
	\| Greek \| 89.28% \| 2,799 images \|
	\| Korean \| 88.0% \| 5,007 images \|
	\| English \| 85.25% \| 6,530 images \|
	\| Latin \| 84.7% \| 3,111 images \|
	\| Thai \| 82.68% \| 4,261 images \|
	\| East Slavic \| 81.6% \| 7,031 images \|

	---

	## FAQ

	Q: Which version should I use?
	A: Use PP-OCRv5 models for best accuracy. Use PP-OCRv3 only for South Asian languages not available in v5.

	Q: Can I mix v5 and v3 models?
	A: No. Use `detection/v5/det.onnx` with v5 recognition models, and `detection/v3/det.onnx` with v3 recognition models.

	Q: GPU acceleration?
	A: Install `onnxruntime-gpu` instead of `onnxruntime` for 10x faster inference.

	Q: Commercial use?
	A: Yes. Apache 2.0 license allows commercial use.

	---

	## Credits

	- Original Models: [PaddlePaddle Team](https://github.com/PaddlePaddle/PaddleOCR)
	- Conversion: [paddle2onnx](https://github.com/PaddlePaddle/Paddle2ONNX)
	- Source: [PP-OCRv5 Collection](https://huggingface.co/collections/PaddlePaddle/pp-ocrv5-684a5356aef5b4b1d7b85e4b)

	---

	## Links

	- [PaddleOCR GitHub](https://github.com/PaddlePaddle/PaddleOCR)
	- [PaddleOCR Documentation](https://paddlepaddle.github.io/PaddleOCR/)
	- [ONNX Runtime](https://onnxruntime.ai/)
	- [monkt.com](https://monkt.com) - Document processing pipeline

	---

	License: Apache 2.0