layoutlmv3-receipt-invoice

LayoutLMv3 model initialized for receipt and invoice field extraction.

Model Status

โš ๏ธ This is an initialized base model - not yet fine-tuned on custom data.

  • Base Model: microsoft/layoutlmv3-base
  • Status: Ready for deployment and fine-tuning
  • Custom Labels: Configured for receipt/invoice field extraction

Intended Use

This model is configured to extract the following fields from receipts and invoices:

Supported Fields

[ "O", "B-MerchantName", "I-MerchantName", "B-MerchantAddress", "I-MerchantAddress", "B-TransactionDate", "I-TransactionDate", "B-Currency", "I-Currency", "B-Total", "I-Total", "B-TotalTax", "I-TotalTax", "B-InvoiceNumber", "I-InvoiceNumber", "B-Subtotal", "I-Subtotal", "B-LineItems", "I-LineItems" ]

Training Status

This repository contains:

  • โœ… Base LayoutLMv3 architecture
  • โœ… Custom label configuration for receipts/invoices
  • โณ Not yet fine-tuned - using pre-trained weights from microsoft/layoutlmv3-base

Training the Model

To fine-tune this model on your custom data:

# On RunPod GPU pod or local machine with GPU
python main.py --mode train --push-to-hub --version v1.0

This will:

  1. Train on your labeled receipt/invoice data
  2. Update this repository with fine-tuned weights
  3. Tag the trained version (e.g., v1.0, v1.1, etc.)

Usage

Local Inference

from transformers import LayoutLMv3ForTokenClassification, LayoutLMv3Processor
from PIL import Image

# Load model and processor
model = LayoutLMv3ForTokenClassification.from_pretrained("mkdigitalgmbh/runpo-LayoutLM3-Invoice-Receipt")
processor = LayoutLMv3Processor.from_pretrained("mkdigitalgmbh/runpo-LayoutLM3-Invoice-Receipt", apply_ocr=False)

# Prepare inputs (you need OCR results: words and bounding boxes)
image = Image.open("receipt.jpg").convert("RGB")
words = ["STORE", "NAME", "Total:", "$10.99"]
boxes = [[10, 10, 100, 30], [110, 10, 200, 30], [10, 50, 80, 70], [90, 50, 150, 70]]

# Normalize boxes to 0-1000 range
width, height = image.size
normalized_boxes = [[int(1000*x0/width), int(1000*y0/height),
                      int(1000*x1/width), int(1000*y1/height)] for x0,y0,x1,y1 in boxes]

encoding = processor(image, words, boxes=normalized_boxes, return_tensors="pt")
outputs = model(**encoding)
predictions = outputs.logits.argmax(-1)

RunPod Serverless Deployment

This model is designed for deployment on RunPod Serverless:

  1. Build and push Docker image:

    cd deployment/runpod/LayoutLMv3
    python deploy.py --action deploy
    
  2. Create RunPod endpoint:

    • Docker Image: registry.hf.space/your-username/layoutlmv3-inference:latest
    • Environment Variables:
      • HF_REPO_ID=mkdigitalgmbh/runpo-LayoutLM3-Invoice-Receipt
      • HF_TOKEN=<your-token>
      • MODEL_VERSION=main (or specific version tag after training)

Model Architecture

  • Base: microsoft/layoutlmv3-base
  • Task: Token Classification
  • Input: Image + Words + Bounding Boxes
  • Output: Field labels (IOB tagging scheme)
  • Number of Labels: 19

Label Schema

The model uses IOB (Inside-Outside-Beginning) tagging:

  • O: Outside any field
  • B-FieldName: Beginning of a field
  • I-FieldName: Inside/continuation of a field

Example

Text:        ["Total:", "$", "10", ".", "99"]
Labels:      ["B-Total", "I-Total", "I-Total", "I-Total", "I-Total"]
Extracted:   Total: "$ 10 . 99"

Version History

Version Date Description Status
main 2025-11-13 Initialized with base model + custom labels Base (not trained)

After training, versions will be tagged (v1.0, v1.1, etc.).

Training Configuration

When training is performed, the following configuration will be used:

{
  "model_name": "microsoft/layoutlmv3-base",
  "learning_rate": 5e-05,
  "batch_size": 4,
  "num_epochs": 20,
  "warmup_steps": 500,
  "max_length": 512,
  "validation_split": 0.2,
  "random_seed": 42,
  "gradient_accumulation_steps": 2,
  "eval_steps": 100,
  "save_steps": 500,
  "logging_steps": 50
}

Citation

@misc{layoutlmv3-receipt-invoice,
  author = {MK Digital GmbH},
  title = {LayoutLMv3 Receipt/Invoice Field Extraction},
  year = {2025},
  publisher = {Hugging Face},
  howpublished = {\url{https://huggingface.co/mkdigitalgmbh/runpo-LayoutLM3-Invoice-Receipt}}
}

@article{huang2022layoutlmv3,
  title={LayoutLMv3: Pre-training for Document AI with Unified Text and Image Masking},
  author={Huang, Yupan and Lv, Tengchao and Cui, Lei and Lu, Yutong and Wei, Furu},
  journal={arXiv preprint arXiv:2204.08387},
  year={2022}
}

License

Apache 2.0

Contact

For questions or issues, please open an issue in the repository.

Downloads last month
59
Safetensors
Model size
0.1B params
Tensor type
F32
ยท
Inference Providers NEW
This model isn't deployed by any Inference Provider. ๐Ÿ™‹ Ask for provider support