TachiwinOCR

for the Indigenous Languages of Mexico

16 bits precision

This is a PaddleOCR-VL Finetune specialized in the 68 indigenous languages of Mexico and their diverse character and glyph repertoire making a world first in tech access and linguistic rights

Inference

You can perform inference using the PaddleOCR pipeline or the transformers library.

Option A: Using PaddleOCR (Easy Pipeline)

from paddleocr import PaddleOCRVL

# Load the fine-tuned model
pipeline = PaddleOCRVL(
    vl_rec_model_name="PaddleOCR-VL-0.9B",
    vl_rec_model_dir=path_to_tachiwin_downloaded_model,
)

# Predict on an image
output = pipeline.predict("test.png")

for res in output:
    res.print()
    res.save_to_json(save_path="output")
    res.save_to_markdown(save_path="output")

Option B: Using Transformers (Advanced Control)

from PIL import Image
import torch
from transformers import AutoModelForCausalLM, AutoProcessor

# ---- Settings ----
model_path = "tachiwin/PaddleOCR-VL-Tachiwin-BF16"
image_path = "test.png"
# ------------------

DEVICE = "cuda" if torch.cuda.is_available() else "cpu"

image = Image.open(image_path).convert("RGB")

model = AutoModelForCausalLM.from_pretrained(
    model_path, trust_remote_code=True, torch_dtype=torch.bfloat16
).to(DEVICE).eval()
processor = AutoProcessor.from_pretrained(model_path, trust_remote_code=True)

messages = [
    {"role": "user", "content": [
        {"type": "image", "image": image},
        {"type": "text", "text": "OCR:"},
    ]}
]

inputs = processor.apply_chat_template(
    messages, 
    tokenize=True, 
    add_generation_prompt=True, 	
    return_dict=True,
    return_tensors="pt"
).to(DEVICE)

outputs = model.generate(**inputs, max_new_tokens=1024, min_new_tokens=1)
generated_text = processor.batch_decode(outputs, skip_special_tokens=True)[0]
print(generated_text)

📊 Benchmark Results

Tachiwin-OCR was evaluated against the base PaddleOCR-VL model using a diverse subset of Indigenous language samples. The fine-tuning results demonstrate significant improvements in both character and word recognition accuracy.

Summary Metrics

Metric	Base Model (Raw)	Tachiwin-OCR (Fine-tuned)	Improvement
Character Error Rate (CER)	7.59%	6.80%	10.4% (Relative Reduction)
Word Error Rate (WER)	25.17%	17.36%	+7.81% (Absolute)
OCR Accuracy (1 - CER)	92.41%	93.20%	+0.79% (Absolute)

Detailed Comparison (Sample)

A subset of the evaluation results across different languages, where tonal languages are the most improved by this fine-tuning:

Language	Raw CER	FT CER	Raw WER	FT WER	Improvement
`stp` (Tepehuán)	10.95%	0.00%	43.55%	0.00%	+10.95%
`maz` (Central Mazahua)	3.29%	0.41%	9.09%	0.00%	+2.88%
`chj` (Ojitlán Chinantec)	16.97%	2.21%	52.78%	9.72%	+14.76%
`maa` (Tecóatl Mazatec)	86.70%	8.49%	105.08%	10.17%	+78.21%

Key Findings

High Accuracy Gains: In many tonal languages like Tepehuán (stp) and Mazatec (maa), the fine-tuning process reduced the error rate from significant levels to nearly zero or double digits.
Robustness: The model shows high resilience against synthetic distortions implemented during the data generation phase.
Word-Level Performance: The relative reduction in Word Error Rate (WER) highlights the model's improved capability in contextualizing character sequences specific to these language families.

Tachiwin (from Totonac - "Language") is dedicated to bridging the digital divide for indigenous languages of Mexico through AI technology.

Developed by: Tachiwin
License: apache-2.0
Finetuned from model : PaddlePaddle/PaddleOCR-VL

Downloads last month: 64

Safetensors

Model size

1.0B params

Tensor type

BF16

Inference Providers NEW

Any-to-Any

This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for tachiwin/PaddleOCR-VL-Tachiwin-BF16

Base model

baidu/ERNIE-4.5-0.3B-Paddle

Finetuned

unsloth/PaddleOCR-VL

Finetuned

(10)

this model

tachiwin
/

PaddleOCR-VL-Tachiwin-BF16