Towards a Multi-Task 0.9B VLM for Robust In-the-Wild Document Parsing
AI & ML interests
Deep Learning Framework
Recent Activity
View all activity
Papers
GraphNet: A Large-Scale Computational Graph Dataset for Tensor Compiler Research
PaddleOCR-VL: Boosting Multilingual Document Parsing via a 0.9B Ultra-Compact Vision-Language Model
PP-StructureV3 is a SOTA document parsing solution on OmniDocBench, supporting the conversion of PDFs and do cument images to Markdown and JSON.
Boosting Multilingual Document Parsing via a 0.9B Ultra-Compact Vision-Language Model
-
PaddlePaddle/PaddleOCR-VL
Image-Text-to-Text • 1.0B • Updated • 15.9k • 1.54k -
PaddleOCR-VL Online Demo
📈230Convert documents and images into structured text and markdown
-
PaddleOCR-VL: Boosting Multilingual Document Parsing via a 0.9B Ultra-Compact Vision-Language Model
Paper • 2510.14528 • Published • 113
PP-OCRv5 is the latest text recognition solution, supporting Simplified Chinese, Chinese Pinyin, Traditional Chinese, English, and Japanese
Towards a Multi-Task 0.9B VLM for Robust In-the-Wild Document Parsing
Boosting Multilingual Document Parsing via a 0.9B Ultra-Compact Vision-Language Model
-
PaddlePaddle/PaddleOCR-VL
Image-Text-to-Text • 1.0B • Updated • 15.9k • 1.54k -
PaddleOCR-VL Online Demo
📈230Convert documents and images into structured text and markdown
-
PaddleOCR-VL: Boosting Multilingual Document Parsing via a 0.9B Ultra-Compact Vision-Language Model
Paper • 2510.14528 • Published • 113
PP-StructureV3 is a SOTA document parsing solution on OmniDocBench, supporting the conversion of PDFs and do cument images to Markdown and JSON.
PP-OCRv5 is the latest text recognition solution, supporting Simplified Chinese, Chinese Pinyin, Traditional Chinese, English, and Japanese