Felladrin commited on Nov 5

Commit

cdc13b0

verified ·

1 Parent(s): e77bc16

Upload folder using huggingface_hub

Browse files

Files changed (26) hide show

README.md +129 -0
config.json +175 -0
generation_config.json +9 -0
merges.txt +0 -0
onnx/decoder_model.onnx +3 -0
onnx/decoder_model_bnb4.onnx +3 -0
onnx/decoder_model_fp16.onnx +3 -0
onnx/decoder_model_int8.onnx +3 -0
onnx/decoder_model_q4.onnx +3 -0
onnx/decoder_model_q4f16.onnx +3 -0
onnx/decoder_model_quantized.onnx +3 -0
onnx/decoder_model_uint8.onnx +3 -0
onnx/encoder_model.onnx +3 -0
onnx/encoder_model_bnb4.onnx +3 -0
onnx/encoder_model_fp16.onnx +3 -0
onnx/encoder_model_int8.onnx +3 -0
onnx/encoder_model_q4.onnx +3 -0
onnx/encoder_model_q4f16.onnx +3 -0
onnx/encoder_model_quantized.onnx +3 -0
onnx/encoder_model_uint8.onnx +3 -0
preprocessor_config.json +24 -0
quantize_config.json +18 -0
special_tokens_map.json +51 -0
tokenizer.json +0 -0
tokenizer_config.json +63 -0
vocab.json +0 -0

README.md ADDED Viewed

	@@ -0,0 +1,129 @@

+---
+library_name: transformers.js
+tags:
+- PyTorch
+- LaTeX
+- Math OCR
+- Handwritten Math
+metrics:
+- cer
+base_model:
+- tjoab/latex_finetuned
+pipeline_tag: image-to-text
+---
+# latex_finetuned (ONNX)
+This is an ONNX version of [tjoab/latex_finetuned](https://huggingface.co/tjoab/latex_finetuned). It was automatically converted and uploaded using [this Hugging Face Space](https://huggingface.co/spaces/onnx-community/convert-to-onnx).
+## Usage with Transformers.js
+See the pipeline documentation for `image-to-text`: https://huggingface.co/docs/transformers.js/api/pipelines#module_pipelines.ImageToTextPipeline
+---
+# TrOCR-LaTeX (fine-tuned on math handwriting)
+Take your handwritten math and turn it into clean LaTeX code.
+This is a fine-tuned version of [`microsoft/trocr-base-handwritten`](https://huggingface.co/microsoft/trocr-base-handwritten),
+a transformer-based optical character recognition model, adapted to work with handwritten math images and structured math syntax.
+## Data
+Fine-tuned on Google's [`MathWriting`](https://github.com/google-research/google-research/tree/master/mathwriting) dataset. Contains over 500,000 digital inks of handwritten mathematical expressions obtained through either manual labelling or programmatic generation.
+## Intended use & limitations
+You can use this model for OCR on a **single** math expression.
+There is degraded performance on very long expressions (due to image preprocessing, 3:2 aspect ratio seems to work best).
+- Create an expression chunking scheme to split the image into subimages and process each to bypass this limitation.
+- In order to process **multiple** expressions, you need to chuck groups into single expressions.
+## How to use (PyTorch)
+```python
+from transformers import TrOCRProcessor, VisionEncoderDecoderModel
+from PIL import Image
+# Helper funtion (path to either JPEG or PNG)
+def open_PIL_image(image_path: str) -> Image.Image:
+  image = Image.open(image_path)
+  if image_path.split('.')[-1].lower() == 'png':
+      image = Image.composite(image, PIL.Image.new('RGB', image.size, 'white'), image)
+  return image
+# Load model and processor from Hugging Face
+processor = TrOCRProcessor.from_pretrained('tjoab/latex_finetuned')
+model = VisionEncoderDecoderModel.from_pretrained('tjoab/latex_finetuned')
+# Load all images as a batch
+images = [open_PIL_image(path) for path in paths]
+# Preprocess the images
+preproc_image = processor.image_processor(images=images, return_tensors="pt").pixel_values
+# Generate and decode the tokens
+# NOTE: max_length default value is very small, which often results in truncated inference if not set
+pred_ids = model.generate(preproc_image, max_length=128)
+latex_preds = processor.batch_decode(pred_ids, skip_special_tokens=True)
+```
+## Training Details
+- Mini-batch size: 8
+- Optimizer: Adam
+- LR Scheduler: cosine
+- **`fp16` mixed precision**
+  - Trained using automatic mixed precision (AMP) with `torch.cuda.amp` for reduced memory usage.
+- **Gradient accumulation**
+  - Used to simulate a larger effective batch size while keeping per-step memory consumption low.
+  - Optimizer steps occurred every 8 mini-batches.
+## Evaluation
+Performance was evaluated using Character Error Rate (CER) defined as:
+`CER = (Substitutions + Insertions + Deletions) / Total Characters in Ground Truth`
+- #### ✅ Why CER?
+  - Math expressions are structurally sensitive. Shuffling even a single character can completely change the meaning.
+    - `x^2` vs. `x_2`
+    - `\frac{a}{b}` vs. `\frac{b}{a}`
+  - CER will penalizes small error in syntax.
+- **Evalution yeilded a CER of 14.9%.**
+## BibTeX and Citation
+The original TrORC model was introduced in this paper:
+[TrOCR: Transformer-based Optical Character Recognition with Pre-trained Models](https://arxiv.org/abs/2109.10282) by Li et al.
+You can find the source code in [their repository](https://github.com/microsoft/unilm/tree/master/trocr).
+```bibtex
+@misc{li2021trocr,
+      title={TrOCR: Transformer-based Optical Character Recognition with Pre-trained Models},
+      author={Minghao Li and Tengchao Lv and Lei Cui and Yijuan Lu and Dinei Florencio and Cha Zhang and Zhoujun Li and Furu Wei},
+      year={2021},
+      eprint={2109.10282},
+      archivePrefix={arXiv},
+      primaryClass={cs.CL}
+}
+```

config.json ADDED Viewed

	@@ -0,0 +1,175 @@

+{
+  "_attn_implementation_autoset": true,
+  "_name_or_path": "tjoab/latex_finetuned",
+  "architectures": [
+    "VisionEncoderDecoderModel"
+  ],
+  "decoder": {
+    "_attn_implementation_autoset": false,
+    "_name_or_path": "",
+    "activation_dropout": 0.0,
+    "activation_function": "gelu",
+    "add_cross_attention": true,
+    "architectures": null,
+    "attention_dropout": 0.0,
+    "bad_words_ids": null,
+    "begin_suppress_tokens": null,
+    "bos_token_id": 0,
+    "chunk_size_feed_forward": 0,
+    "classifier_dropout": 0.0,
+    "cross_attention_hidden_size": 768,
+    "d_model": 1024,
+    "decoder_attention_heads": 16,
+    "decoder_ffn_dim": 4096,
+    "decoder_layerdrop": 0.0,
+    "decoder_layers": 12,
+    "decoder_start_token_id": 2,
+    "diversity_penalty": 0.0,
+    "do_sample": false,
+    "dropout": 0.1,
+    "early_stopping": false,
+    "encoder_no_repeat_ngram_size": 0,
+    "eos_token_id": 2,
+    "exponential_decay_length_penalty": null,
+    "finetuning_task": null,
+    "forced_bos_token_id": null,
+    "forced_eos_token_id": null,
+    "id2label": {
+      "0": "LABEL_0",
+      "1": "LABEL_1"
+    },
+    "init_std": 0.02,
+    "is_decoder": true,
+    "is_encoder_decoder": false,
+    "label2id": {
+      "LABEL_0": 0,
+      "LABEL_1": 1
+    },
+    "layernorm_embedding": true,
+    "length_penalty": 1.0,
+    "max_length": 20,
+    "max_position_embeddings": 512,
+    "min_length": 0,
+    "model_type": "trocr",
+    "no_repeat_ngram_size": 0,
+    "num_beam_groups": 1,
+    "num_beams": 1,
+    "num_return_sequences": 1,
+    "output_attentions": false,
+    "output_hidden_states": false,
+    "output_scores": false,
+    "pad_token_id": 1,
+    "prefix": null,
+    "problem_type": null,
+    "pruned_heads": {},
+    "remove_invalid_values": false,
+    "repetition_penalty": 1.0,
+    "return_dict": true,
+    "return_dict_in_generate": false,
+    "scale_embedding": false,
+    "sep_token_id": null,
+    "suppress_tokens": null,
+    "task_specific_params": null,
+    "temperature": 1.0,
+    "tf_legacy_loss": false,
+    "tie_encoder_decoder": false,
+    "tie_word_embeddings": true,
+    "tokenizer_class": null,
+    "top_k": 50,
+    "top_p": 1.0,
+    "torch_dtype": "float32",
+    "torchscript": false,
+    "typical_p": 1.0,
+    "use_bfloat16": false,
+    "use_cache": false,
+    "use_learned_position_embeddings": true,
+    "vocab_size": 50265
+  },
+  "decoder_start_token_id": 0,
+  "encoder": {
+    "_attn_implementation_autoset": false,
+    "_name_or_path": "",
+    "add_cross_attention": false,
+    "architectures": null,
+    "attention_probs_dropout_prob": 0.0,
+    "bad_words_ids": null,
+    "begin_suppress_tokens": null,
+    "bos_token_id": null,
+    "chunk_size_feed_forward": 0,
+    "cross_attention_hidden_size": null,
+    "decoder_start_token_id": null,
+    "diversity_penalty": 0.0,
+    "do_sample": false,
+    "early_stopping": false,
+    "encoder_no_repeat_ngram_size": 0,
+    "encoder_stride": 16,
+    "eos_token_id": null,
+    "exponential_decay_length_penalty": null,
+    "finetuning_task": null,
+    "forced_bos_token_id": null,
+    "forced_eos_token_id": null,
+    "hidden_act": "gelu",
+    "hidden_dropout_prob": 0.0,
+    "hidden_size": 768,
+    "id2label": {
+      "0": "LABEL_0",
+      "1": "LABEL_1"
+    },
+    "image_size": 384,
+    "initializer_range": 0.02,
+    "intermediate_size": 3072,
+    "is_decoder": false,
+    "is_encoder_decoder": false,
+    "label2id": {
+      "LABEL_0": 0,
+      "LABEL_1": 1
+    },
+    "layer_norm_eps": 1e-12,
+    "length_penalty": 1.0,
+    "max_length": 20,
+    "min_length": 0,
+    "model_type": "vit",
+    "no_repeat_ngram_size": 0,
+    "num_attention_heads": 12,
+    "num_beam_groups": 1,
+    "num_beams": 1,
+    "num_channels": 3,
+    "num_hidden_layers": 12,
+    "num_return_sequences": 1,
+    "output_attentions": false,
+    "output_hidden_states": false,
+    "output_scores": false,
+    "pad_token_id": null,
+    "patch_size": 16,
+    "prefix": null,
+    "problem_type": null,
+    "pruned_heads": {},
+    "qkv_bias": false,
+    "remove_invalid_values": false,
+    "repetition_penalty": 1.0,
+    "return_dict": true,
+    "return_dict_in_generate": false,
+    "sep_token_id": null,
+    "suppress_tokens": null,
+    "task_specific_params": null,
+    "temperature": 1.0,
+    "tf_legacy_loss": false,
+    "tie_encoder_decoder": false,
+    "tie_word_embeddings": true,
+    "tokenizer_class": null,
+    "top_k": 50,
+    "top_p": 1.0,
+    "torch_dtype": "float32",
+    "torchscript": false,
+    "typical_p": 1.0,
+    "use_bfloat16": false
+  },
+  "is_encoder_decoder": true,
+  "model_type": "vision-encoder-decoder",
+  "pad_token_id": 1,
+  "processor_class": "TrOCRProcessor",
+  "tie_word_embeddings": false,
+  "torch_dtype": "float32",
+  "transformers_version": "4.49.0",
+  "use_cache": false
+}

generation_config.json ADDED Viewed

	@@ -0,0 +1,9 @@

+{
+  "_from_model_config": true,
+  "bos_token_id": 0,
+  "decoder_start_token_id": 2,
+  "eos_token_id": 2,
+  "pad_token_id": 1,
+  "transformers_version": "4.49.0",
+  "use_cache": false
+}

merges.txt ADDED Viewed

The diff for this file is too large to render. See raw diff

onnx/decoder_model.onnx ADDED Viewed

	@@ -0,0 +1,3 @@

+version https://git-lfs.github.com/spec/v1
+oid sha256:697461342091573d56e8dd0d3414e52e01549a44a10801176efbc6e27b56f630
+size 1195478750

onnx/decoder_model_bnb4.onnx ADDED Viewed

	@@ -0,0 +1,3 @@

+version https://git-lfs.github.com/spec/v1
+oid sha256:f79a283c91944253b398064f05633b8212998feac7903877c1db62ce86f0c7f8
+size 348131618

onnx/decoder_model_fp16.onnx ADDED Viewed

	@@ -0,0 +1,3 @@

+version https://git-lfs.github.com/spec/v1
+oid sha256:ede0057f4f909941e38f764140c44f7965061cce15e52f3353513251024e53cc
+size 597999543

onnx/decoder_model_int8.onnx ADDED Viewed

	@@ -0,0 +1,3 @@

+version https://git-lfs.github.com/spec/v1
+oid sha256:212218ae0e30939fd2a9ab9bc90a6c456f1fcd2c7dd122f89e7973f77673ce3c
+size 300116959

onnx/decoder_model_q4.onnx ADDED Viewed

	@@ -0,0 +1,3 @@

+version https://git-lfs.github.com/spec/v1
+oid sha256:5f7af5cfa624ffab4b3b8a7fe6cc80890576fdb1e38f8109fdf225437737fc08
+size 363537318

onnx/decoder_model_q4f16.onnx ADDED Viewed

	@@ -0,0 +1,3 @@

+version https://git-lfs.github.com/spec/v1
+oid sha256:2aa25f6a674738a243a2b9eb6c552e308876dca1be5322b673acb371505e9a72
+size 243664494

onnx/decoder_model_quantized.onnx ADDED Viewed

	@@ -0,0 +1,3 @@

+version https://git-lfs.github.com/spec/v1
+oid sha256:212218ae0e30939fd2a9ab9bc90a6c456f1fcd2c7dd122f89e7973f77673ce3c
+size 300116959

onnx/decoder_model_uint8.onnx ADDED Viewed

	@@ -0,0 +1,3 @@

+version https://git-lfs.github.com/spec/v1
+oid sha256:155f1ad0766c631397c35256db57b490241083261f5e3f8fd94751d11e459f71
+size 300117018

onnx/encoder_model.onnx ADDED Viewed

	@@ -0,0 +1,3 @@

+version https://git-lfs.github.com/spec/v1
+oid sha256:c254c606233a8cb5523ffb16eb92074713b79c9b52e77d1579cdcfaef172927e
+size 344426599

onnx/encoder_model_bnb4.onnx ADDED Viewed

	@@ -0,0 +1,3 @@

+version https://git-lfs.github.com/spec/v1
+oid sha256:80b653054a6715c4ab4b5140a3a4d4df9f72cd25c3be8c734388ccd205ce6e79
+size 52474921

onnx/encoder_model_fp16.onnx ADDED Viewed

	@@ -0,0 +1,3 @@

+version https://git-lfs.github.com/spec/v1
+oid sha256:25ea979038a4d38017c9d5fbbd347870d0462d0b4554672425816b9e3513c004
+size 172301001

onnx/encoder_model_int8.onnx ADDED Viewed

	@@ -0,0 +1,3 @@

+version https://git-lfs.github.com/spec/v1
+oid sha256:4357ff953e4b731c94c65ab1d7e043c6cdb3d06e681c7cdc2db863c0a2a0096d
+size 87942932

onnx/encoder_model_q4.onnx ADDED Viewed

	@@ -0,0 +1,3 @@

+version https://git-lfs.github.com/spec/v1
+oid sha256:252058d59b4be0c0d180e0272c18478203e0b1327ff54d289ba97694da482edb
+size 57782809

onnx/encoder_model_q4f16.onnx ADDED Viewed

	@@ -0,0 +1,3 @@

+version https://git-lfs.github.com/spec/v1
+oid sha256:8ee5bbcf028dee9db5b3b225395092810e2cb7477b8751f353b748d10173c742
+size 50218204

onnx/encoder_model_quantized.onnx ADDED Viewed

	@@ -0,0 +1,3 @@

+version https://git-lfs.github.com/spec/v1
+oid sha256:4d169a795b252e04fc60083268b9b4dab3f6ffea262a26ec99643d9cca0f54a2
+size 87942969

onnx/encoder_model_uint8.onnx ADDED Viewed

	@@ -0,0 +1,3 @@

+version https://git-lfs.github.com/spec/v1
+oid sha256:4d169a795b252e04fc60083268b9b4dab3f6ffea262a26ec99643d9cca0f54a2
+size 87942969

preprocessor_config.json ADDED Viewed

	@@ -0,0 +1,24 @@

+{
+  "do_convert_rgb": null,
+  "do_normalize": true,
+  "do_rescale": true,
+  "do_resize": true,
+  "image_mean": [
+    0.5,
+    0.5,
+    0.5
+  ],
+  "image_processor_type": "ViTImageProcessor",
+  "image_std": [
+    0.5,
+    0.5,
+    0.5
+  ],
+  "processor_class": "TrOCRProcessor",
+  "resample": 2,
+  "rescale_factor": 0.00392156862745098,
+  "size": {
+    "height": 384,
+    "width": 384
+  }
+}

quantize_config.json ADDED Viewed

	@@ -0,0 +1,18 @@

+{
+    "modes": [
+        "fp16",
+        "q8",
+        "int8",
+        "uint8",
+        "q4",
+        "q4f16",
+        "bnb4"
+    ],
+    "per_channel": false,
+    "reduce_range": false,
+    "block_size": null,
+    "is_symmetric": true,
+    "accuracy_level": null,
+    "quant_type": 1,
+    "op_block_list": null
+}

special_tokens_map.json ADDED Viewed

	@@ -0,0 +1,51 @@

+{
+  "bos_token": {
+    "content": "<s>",
+    "lstrip": false,
+    "normalized": true,
+    "rstrip": false,
+    "single_word": false
+  },
+  "cls_token": {
+    "content": "<s>",
+    "lstrip": false,
+    "normalized": true,
+    "rstrip": false,
+    "single_word": false
+  },
+  "eos_token": {
+    "content": "</s>",
+    "lstrip": false,
+    "normalized": true,
+    "rstrip": false,
+    "single_word": false
+  },
+  "mask_token": {
+    "content": "<mask>",
+    "lstrip": true,
+    "normalized": true,
+    "rstrip": false,
+    "single_word": false
+  },
+  "pad_token": {
+    "content": "<pad>",
+    "lstrip": false,
+    "normalized": true,
+    "rstrip": false,
+    "single_word": false
+  },
+  "sep_token": {
+    "content": "</s>",
+    "lstrip": false,
+    "normalized": true,
+    "rstrip": false,
+    "single_word": false
+  },
+  "unk_token": {
+    "content": "<unk>",
+    "lstrip": false,
+    "normalized": true,
+    "rstrip": false,
+    "single_word": false
+  }
+}

tokenizer.json ADDED Viewed

The diff for this file is too large to render. See raw diff

tokenizer_config.json ADDED Viewed

	@@ -0,0 +1,63 @@

+{
+  "add_prefix_space": false,
+  "added_tokens_decoder": {
+    "0": {
+      "content": "<s>",
+      "lstrip": false,
+      "normalized": true,
+      "rstrip": false,
+      "single_word": false,
+      "special": true
+    },
+    "1": {
+      "content": "<pad>",
+      "lstrip": false,
+      "normalized": true,
+      "rstrip": false,
+      "single_word": false,
+      "special": true
+    },
+    "2": {
+      "content": "</s>",
+      "lstrip": false,
+      "normalized": true,
+      "rstrip": false,
+      "single_word": false,
+      "special": true
+    },
+    "3": {
+      "content": "<unk>",
+      "lstrip": false,
+      "normalized": true,
+      "rstrip": false,
+      "single_word": false,
+      "special": true
+    },
+    "50264": {
+      "content": "<mask>",
+      "lstrip": true,
+      "normalized": true,
+      "rstrip": false,
+      "single_word": false,
+      "special": true
+    }
+  },
+  "bos_token": "<s>",
+  "clean_up_tokenization_spaces": false,
+  "cls_token": "<s>",
+  "eos_token": "</s>",
+  "errors": "replace",
+  "extra_special_tokens": {},
+  "mask_token": "<mask>",
+  "max_length": null,
+  "model_max_length": 512,
+  "pad_to_multiple_of": null,
+  "pad_token": "<pad>",
+  "pad_token_type_id": 0,
+  "padding_side": "right",
+  "processor_class": "TrOCRProcessor",
+  "sep_token": "</s>",
+  "tokenizer_class": "RobertaTokenizer",
+  "trim_offsets": true,
+  "unk_token": "<unk>"
+}

vocab.json ADDED Viewed

The diff for this file is too large to render. See raw diff