YAML Metadata Warning: empty or missing yaml metadata in repo card (https://huggingface.co/docs/hub/model-cards#model-card-metadata)

UniRec-0.1B: Unified Text and Formula Recognition with 0.1B Parameters

[Paper coming soon] [ModelScope Demo] [Hugging Face Demo] [Local Demo]

Note that this model card represents UniRec-0.1B-1217, which is capable of recognizing not only text and mathematical formulas, but also tables.

Usage with transformers

# Download the model
huggingface-cli download topdu/unirec_0_1b --local-dir ./unirec_0_1b

import torch
from transformers import AutoTokenizer
from unirec_0_1b.modeling_unirec import UniRecForConditionalGeneration
from unirec_0_1b.processing_unirec import UniRecImageProcessor, clean_special_tokens
from PIL import Image

tokenizer = AutoTokenizer.from_pretrained("./unirec_0_1b")
model = UniRecForConditionalGeneration.from_pretrained("./unirec_0_1b", device_map='cuda:0', torch_dtype=torch.bfloat16)
model.eval()
model = torch.compile(model, mode="reduce-overhead", dynamic=True)  # 动态尺寸适配
processor = UniRecImageProcessor.from_pretrained("./unirec_0_1b")

image_path = 'path_to_image.jpg'
img = Image.open(image_path)
data_img = processor(img, return_tensors="pt")
img = data_img['pixel_values'][0]

inputs = {
    'pixel_values': img[:3, :, :].unsqueeze(0).to('cuda:0').bfloat16(),
    'input_ids': None,
    'attention_mask': None
    }
with torch.no_grad():
    preds = model.generate(**inputs, max_length=2048, num_beams=1, do_sample=False, use_cache=True)
res = tokenizer.batch_decode(preds, skip_special_tokens=False)
pred_text = clean_special_tokens(res[0])
print(pred_text)

Citation

If you find our method useful for your reserach, please cite:

Downloads last month: 26

Safetensors

Model size

0.1B params

Tensor type

F32

Inference Providers NEW

This model isn't deployed by any Inference Provider. 🙋 Ask for provider support