MonOCR: Production-Ready Mon Language OCR
MonOCR is an Optical Character Recognition (OCR) model engineered for the Mon language (mnw). Optimized for performance and accuracy, it accurately recognizes Mon characters from documents, digital texts, and scene images.
This repository serves as the official distribution point for MonOCR model weights in deployment-ready formats.
Software Development Kits
Unified SDKs are available for seamless integration into existing applications. These SDKs handle model caching, image preprocessing, and inference out-of-the-box.
| SDK | Platform | Registry |
|---|---|---|
| monocr-onnx | Python | PyPI |
| monocr | Node.js | npm |
| monocr-go | Go | GitHub |
Model Checkpoints
| Format | Path | Intended Use Case |
|---|---|---|
| ONNX | onnx/monocr.onnx |
Standard deployments (Server/Desktop). |
| TFLite (int8) | tflite/monocr.tflite |
Extreme edge/mobile (Low latency, minimized size). |
| TFLite (fp16) | tflite/float16.tflite |
High-efficiency mobile GPU acceleration. |
| TFLite (fp32) | tflite/float32.tflite |
High-precision mobile inference. |
| PyTorch | pytorch/monocr.ckpt |
Training, fine-tuning, and research. |
Technical Specification
- Core Architecture: ResNet-18 Backbone + 2-layer BiLSTM + Linear CTC Head.
- Input Tensors: Grayscale (1-channel), 64px Height, Variable Width.
- Image Preprocessing: Aspect-ratio preserving resize to 64px height, followed by
[0, 1]pixel normalization. - Decoding Strategy: Connectionist Temporal Classification (CTC) Greedy Decoding.
- Vocabulary: 224 Mon characters, punctuation, and formatting symbols (see
charset.txt).
Integration Guidelines
For developers building custom drivers:
- Refer to
charset.txtfor the index-to-character mapping (Index 0 is reserved for<blank>). - Ensure input images are high-contrast and properly scaled to 64px height.
- ONNX models use dynamic axes for width to support varying word lengths without padding.
License
All model weights and metadata are provided under the MIT License.
Citation
@software{monocr2026,
author = {Janakh},
title = {MonOCR: Production-Ready OCR for Mon Language},
year = {2026},
url = {https://huggingface.co/janakhpon/monocr}
}
- Downloads last month
- 21