LayoutLMv3-LoRA for Invoice Number Extraction
Model Summary
| Field | Details |
|---|---|
| Base Model | microsoft/layoutlmv3-base |
| Model Name | layoutlmv3-lora-invoice-number |
| Fine-Tuning Method | LoRA (Low-Rank Adaptation) |
| Task | Token Classification β Invoice Number Extraction |
| Dataset | SROIE 2019 (invoice subset) |
| License | MIT (inherited from base model) |
| Developed by | Ryan Z. Nie |
Model Description
This model fine-tunes LayoutLMv3-base using LoRA for the task of invoice number extraction from scanned receipts and invoices. It leverages both visual (layout) and textual information from documents to identify and extract invoice numbers accurately.
The model is lightweight and memory-efficient, trained with low-rank adapters on attention and MLP layers to minimize computational and storage costs without sacrificing accuracy.
Intended Use
Primary Task
Token classification for invoice number extraction from document images.
Input
OCR-parsed document images containing:
- Text words
- Bounding boxes
- Layout information
Output
Invoice number tokens tagged using BIO labels.
Example Use Case
Extracting invoice or bill numbers from scanned receipts in accounting automation systems or document understanding pipelines.
Example Usage
from transformers import AutoProcessor, AutoModelForTokenClassification
from PIL import Image
import torch
# Load processor and model
processor = AutoProcessor.from_pretrained("ryanznie/layoutlmv3-lora-invoice-number")
model = AutoModelForTokenClassification.from_pretrained("ryanznie/layoutlmv3-lora-invoice-number")
# Example input
image = Image.open("invoice_sample.jpg")
words = ["Invoice", "No.", "PEGIV-1030765"]
boxes = [[100, 200, 200, 230], [210, 200, 250, 230], [260, 200, 400, 230]]
# Preprocess
encoding = processor(image, words, boxes=boxes, return_tensors="pt")
# Predict
outputs = model(**encoding)
predictions = torch.argmax(outputs.logits, dim=-1)
# Print results
print(predictions)
Training Details
Dataset
SROIE 2019 w/ invoices Dataset Dataset Documentations
Training Configuration
- Hardware: Apple MacBook M2 (8-core CPU, 16GB RAM)
- Acceleration: Apple Metal (MPS)
- Duration: ~1.5β2 hours
- Framework: Hugging Face Transformers
- Fine-tuning Method: LoRA (on attention and MLP layers)
- Optimization Objective: FocalLoss
- Training Mode: Mixed-precision training
Technical Specifications
- Base Architecture: LayoutLMv3
- Adapter Type: LoRA
- Target Modules: Attention and MLP layers
- Objective: Token classification for invoice number extraction
Framework Versions
| Component | Version |
|---|---|
| Python | 3.11.13 |
| PyTorch | 2.8.0 |
| Transformers | 4.57.0 |
| PEFT | 0.17.1 |
Performance
The model performs well on invoice number extraction tasks, correctly combining multi-token predictions into complete invoice numbers (e.g., PEGIV-1030765). After postprocessing, it achieves ~81% accuracy on the SROIE 2019 test set.
Evaluation Metrics
- F1-score for entity-level invoice number recognition
- Precision and recall measured on validation split of SROIE 2019
- Overall accuracy
Limitations
- The model is specialized for English-language invoices and SROIE-like layouts
- May mispredict when invoice number patterns differ significantly (e.g., multiple dashes or alphanumeric codes not seen during training)
- Performance may degrade on handwritten or low-quality scans
- Limited to document types similar to those in the training dataset
Ethical Considerations
- Ensure document data used respects privacy and does not contain sensitive or personal information
- The model should not be used to process private or confidential documents without explicit consent
- Consider data protection regulations (GDPR, CCPA, etc.) when processing invoices
- Verify accuracy before using in production systems that affect financial decisions
Environmental Impact
Carbon emissions estimated using the Machine Learning Impact Calculator presented in Lacoste et al. (2019).
- Hardware Type: Apple MacBook M2 (8-core)
- Hours Used: ~2 hours
- Cloud Provider: Local (no cloud compute)
- Compute Region: United States
- Carbon Emitted: Negligible (< 0.01 kg COβe)
Glossary
- LayoutLMv3 β A transformer-based model for document understanding that fuses text, layout, and image embeddings
- LoRA (Low-Rank Adaptation) β A lightweight fine-tuning method where small trainable matrices are added to specific layers (e.g., attention and MLP), enabling efficient adaptation without updating full model weights
- Token Classification β A form of sequence labeling where each token is assigned a class label, used here for identifying invoice numbers within document text
- BIO Labels β Begin, Inside, Outside tagging scheme for named entity recognition
Citation
If you use this model, please cite:
@misc{nie2025layoutlmv3lora,
author = {Ryan Z. Nie},
title = {LayoutLMv3-LoRA for Invoice Number Extraction},
year = {2025},
howpublished = {\url{https://huggingface.co/ryanznie/layoutlmv3-lora-invoice-number}},
note = {Fine-tuned LayoutLMv3 using LoRA on the SROIE 2019 dataset.}
}
Contact Information
For feedback, questions, or collaboration inquiries:
- Hugging Face: @ryanznie
- Email: ryanznie [at] gatech [dot] edu
- GitHub: @ryanznie
- LinkedIn: in/ryanznie
Model Card Author: Ryan Z. Nie
Date: October 2025
- Downloads last month
- 13
Model tree for ryanznie/layoutlmv3-lora-invoice-number
Base model
microsoft/layoutlmv3-base