ModernBERT-large-NER

This model is a fine-tuned version of answerdotai/ModernBERT-large for Named Entity Recognition (NER) tasks on conll2003 dataset.

Model description

ModernBERT-large-NER is a token classification model trained to identify and categorize named entities in text. Built on the ModernBERT-large architecture, this model leverages modern transformer optimizations for efficient and accurate entity extraction.

Intended Uses

Primary Use Cases:

Named Entity Recognition in text documents
Information extraction pipelines

Intended Users:

NLP researchers and practitioners
Data scientists working with text data
Developers building information extraction systems

Limitations

Known Limitations:

Performance may vary on domains significantly different from the training data
Entity boundaries might be imperfect for complex or nested entities
May require domain-specific fine-tuning for specialized applications (medical, legal, etc.)
Performance on low-resource languages or code-switched text not evaluated

Out-of-Scope Uses:

Real-time processing of sensitive personal information without proper privacy safeguards
High-stakes decision making without human oversight
Applications requiring 100% accuracy in entity detection

Training and evaluation data

The model was trained on a dataset for named entity recognition. Specific details about the dataset composition, size, and entity types are not publicly disclosed in this release.

Performance

It achieves the following results on the evaluation set:

Loss: 0.0508
Precision: 0.9230
Recall: 0.9399
F1: 0.9314
Accuracy: 0.9861

Training procedure

Training hyperparameters

The following hyperparameters were used during training:

learning_rate: 2e-05
train_batch_size: 32
eval_batch_size: 32
seed: 42
optimizer: Use OptimizerNames.ADAMW_TORCH with betas=(0.9,0.999) and epsilon=1e-08 and optimizer_args=No additional optimizer arguments
lr_scheduler_type: cosine
num_epochs: 5

Training results

Training Loss	Epoch	Step	Validation Loss	Precision	Recall	F1	Accuracy
No log	1.0	439	0.0776	0.8749	0.9122	0.8931	0.9800
0.1518	2.0	878	0.0508	0.9230	0.9399	0.9314	0.9861
0.0334	3.0	1317	0.0509	0.9219	0.9493	0.9354	0.9880
0.0097	4.0	1756	0.0535	0.9267	0.9505	0.9384	0.9888
0.0029	5.0	2195	0.0555	0.9272	0.9519	0.9394	0.9889

Framework versions

Transformers 5.1.0
Pytorch 2.7.0a0+ecf3bae40a.nv25.02
Datasets 4.5.0
Tokenizers 0.22.2

How to Use

import torch
from transformers import AutoModelForTokenClassification, AutoTokenizer, pipeline

# Create NER pipeline
ner_pipeline = pipeline(
    "token-classification",
    model="MatteoFasulo/ModernBERT-large-NER",
    aggregation_strategy="simple",
    dtype=torch.bfloat16,
)

# Example usage
text = "Apple Inc. was founded by Steve Jobs in Cupertino, California."
entities = ner_pipeline(text)

for entity in entities:
    print(
        f"{entity['word']}: {entity['entity_group']} (confidence: {entity['score']:.4f})"
    )

# Apple Inc.: ORG (confidence: 0.9684)
# Steve Jobs: PER (confidence: 0.9950)
# Cupertino: LOC (confidence: 0.9876)
# California: LOC (confidence: 0.9939)

Ethical Considerations

Privacy: This model may extract personal information (names, locations, organizations) from text. Users should:

Implement appropriate data protection measures
Comply with relevant privacy regulations (GDPR, CCPA, etc.)
Obtain necessary consent before processing personal data

Bias: The model's performance may reflect biases present in the training data, potentially affecting:

Recognition rates across different demographic groups
Entity detection in various cultural contexts
Performance on minority or underrepresented entities

Users should validate the model's performance on their specific use cases and implement bias mitigation strategies as needed.

Citation

If you use this model in your research, please cite ModernBERT model:

@misc{modernbert,
      title={Smarter, Better, Faster, Longer: A Modern Bidirectional Encoder for Fast, Memory Efficient, and Long Context Finetuning and Inference}, 
      author={Benjamin Warner and Antoine Chaffin and Benjamin Clavié and Orion Weller and Oskar Hallström and Said Taghadouini and Alexis Gallagher and Raja Biswas and Faisal Ladhak and Tom Aarsen and Nathan Cooper and Griffin Adams and Jeremy Howard and Iacopo Poli},
      year={2024},
      eprint={2412.13663},
      archivePrefix={arXiv},
      primaryClass={cs.CL},
      url={https://arxiv.org/abs/2412.13663}, 
}

License

This model is released under the Apache 2.0 License. See the LICENSE file for details.

Acknowledgments

This model was built using the ModernBERT-large architecture from Answer.AI and trained using the Hugging Face Transformers library.

Downloads last month: 14

Safetensors

Model size

0.4B params

Tensor type

F32

Model tree for MatteoFasulo/ModernBERT-large-NER

Base model

answerdotai/ModernBERT-large

Finetuned

(244)

this model

Dataset used to train MatteoFasulo/ModernBERT-large-NER

Paper for MatteoFasulo/ModernBERT-large-NER

Smarter, Better, Faster, Longer: A Modern Bidirectional Encoder for Fast, Memory Efficient, and Long Context Finetuning and Inference

Paper • 2412.13663 • Published Dec 18, 2024 • 159

Evaluation results

Precision on conll2003
self-reported

0.923
Recall on conll2003
self-reported

0.940
F1 on conll2003
self-reported

0.931
Accuracy on conll2003
self-reported

0.986