layoutlmv3-receipt-invoice
LayoutLMv3 model initialized for receipt and invoice field extraction.
Model Status
โ ๏ธ This is an initialized base model - not yet fine-tuned on custom data.
- Base Model:
microsoft/layoutlmv3-base - Status: Ready for deployment and fine-tuning
- Custom Labels: Configured for receipt/invoice field extraction
Intended Use
This model is configured to extract the following fields from receipts and invoices:
Supported Fields
[ "O", "B-MerchantName", "I-MerchantName", "B-MerchantAddress", "I-MerchantAddress", "B-TransactionDate", "I-TransactionDate", "B-Currency", "I-Currency", "B-Total", "I-Total", "B-TotalTax", "I-TotalTax", "B-InvoiceNumber", "I-InvoiceNumber", "B-Subtotal", "I-Subtotal", "B-LineItems", "I-LineItems" ]
Training Status
This repository contains:
- โ Base LayoutLMv3 architecture
- โ Custom label configuration for receipts/invoices
- โณ Not yet fine-tuned - using pre-trained weights from
microsoft/layoutlmv3-base
Training the Model
To fine-tune this model on your custom data:
# On RunPod GPU pod or local machine with GPU
python main.py --mode train --push-to-hub --version v1.0
This will:
- Train on your labeled receipt/invoice data
- Update this repository with fine-tuned weights
- Tag the trained version (e.g., v1.0, v1.1, etc.)
Usage
Local Inference
from transformers import LayoutLMv3ForTokenClassification, LayoutLMv3Processor
from PIL import Image
# Load model and processor
model = LayoutLMv3ForTokenClassification.from_pretrained("mkdigitalgmbh/runpo-LayoutLM3-Invoice-Receipt")
processor = LayoutLMv3Processor.from_pretrained("mkdigitalgmbh/runpo-LayoutLM3-Invoice-Receipt", apply_ocr=False)
# Prepare inputs (you need OCR results: words and bounding boxes)
image = Image.open("receipt.jpg").convert("RGB")
words = ["STORE", "NAME", "Total:", "$10.99"]
boxes = [[10, 10, 100, 30], [110, 10, 200, 30], [10, 50, 80, 70], [90, 50, 150, 70]]
# Normalize boxes to 0-1000 range
width, height = image.size
normalized_boxes = [[int(1000*x0/width), int(1000*y0/height),
int(1000*x1/width), int(1000*y1/height)] for x0,y0,x1,y1 in boxes]
encoding = processor(image, words, boxes=normalized_boxes, return_tensors="pt")
outputs = model(**encoding)
predictions = outputs.logits.argmax(-1)
RunPod Serverless Deployment
This model is designed for deployment on RunPod Serverless:
Build and push Docker image:
cd deployment/runpod/LayoutLMv3 python deploy.py --action deployCreate RunPod endpoint:
- Docker Image:
registry.hf.space/your-username/layoutlmv3-inference:latest - Environment Variables:
HF_REPO_ID=mkdigitalgmbh/runpo-LayoutLM3-Invoice-ReceiptHF_TOKEN=<your-token>MODEL_VERSION=main(or specific version tag after training)
- Docker Image:
Model Architecture
- Base: microsoft/layoutlmv3-base
- Task: Token Classification
- Input: Image + Words + Bounding Boxes
- Output: Field labels (IOB tagging scheme)
- Number of Labels: 19
Label Schema
The model uses IOB (Inside-Outside-Beginning) tagging:
- O: Outside any field
- B-FieldName: Beginning of a field
- I-FieldName: Inside/continuation of a field
Example
Text: ["Total:", "$", "10", ".", "99"]
Labels: ["B-Total", "I-Total", "I-Total", "I-Total", "I-Total"]
Extracted: Total: "$ 10 . 99"
Version History
| Version | Date | Description | Status |
|---|---|---|---|
| main | 2025-11-13 | Initialized with base model + custom labels | Base (not trained) |
After training, versions will be tagged (v1.0, v1.1, etc.).
Training Configuration
When training is performed, the following configuration will be used:
{
"model_name": "microsoft/layoutlmv3-base",
"learning_rate": 5e-05,
"batch_size": 4,
"num_epochs": 20,
"warmup_steps": 500,
"max_length": 512,
"validation_split": 0.2,
"random_seed": 42,
"gradient_accumulation_steps": 2,
"eval_steps": 100,
"save_steps": 500,
"logging_steps": 50
}
Citation
@misc{layoutlmv3-receipt-invoice,
author = {MK Digital GmbH},
title = {LayoutLMv3 Receipt/Invoice Field Extraction},
year = {2025},
publisher = {Hugging Face},
howpublished = {\url{https://huggingface.co/mkdigitalgmbh/runpo-LayoutLM3-Invoice-Receipt}}
}
@article{huang2022layoutlmv3,
title={LayoutLMv3: Pre-training for Document AI with Unified Text and Image Masking},
author={Huang, Yupan and Lv, Tengchao and Cui, Lei and Lu, Yutong and Wei, Furu},
journal={arXiv preprint arXiv:2204.08387},
year={2022}
}
License
Apache 2.0
Contact
For questions or issues, please open an issue in the repository.
- Downloads last month
- 59