File size: 3,176 Bytes

---
license: apache-2.0
tags:
- domain-generation-algorithm
- cybersecurity
- domain-classification
- security
- malware-detection
language:
- en
library_name: transformers
pipeline_tag: text-classification
base_model: answerdotai/ModernBERT-base
---

# ModernBERT DGA Detector

This model is designed to classify domains as either legitimate or generated by Domain Generation Algorithms (DGA).

## Model Description

- **Model Type:** BERT-based sequence classification
- **Task:** Binary classification (Legitimate vs DGA domains)
- **Base Model:** ModernBERT-base
- **Training Data:** Domain names dataset
- **Author:** Reynier Leyva La O, Carlos A. Catania

## Usage

```python
from transformers import AutoTokenizer, AutoModelForSequenceClassification
import torch

# Load model and tokenizer
tokenizer = AutoTokenizer.from_pretrained("Reynier/modernbert-dga-detector")
model = AutoModelForSequenceClassification.from_pretrained("Reynier/modernbert-dga-detector")

# Example prediction
def predict_domain(domain):
    inputs = tokenizer(domain, return_tensors="pt", max_length=64, truncation=True, padding=True)
    with torch.no_grad():
        outputs = model(**inputs)
        predictions = torch.softmax(outputs.logits, dim=-1)
        legit_prob = predictions[0][0].item()
        dga_prob = predictions[0][1].item()
    return {"prediction": "DGA" if dga_prob > legit_prob else "LEGITIMATE", 
             "confidence": max(legit_prob, dga_prob)}

# Test examples
domains = ["google.com", "xkvbzpqr.net", "facebook.com", "abcdef123456.com"]
for domain in domains:
    result = predict_domain(domain)
    print(f"{domain} -> {result['prediction']} (confidence: {result['confidence']:.3f})")
```

## Model Architecture

The model is based on ModernBERT and fine-tuned for domain classification:
- Input: Domain names (text)
- Output: Binary classification (0=LEGITIMATE, 1=DGA)
- Max sequence length: 64 tokens

## Training Details

This model was fine-tuned on a dataset of legitimate and DGA-generated domains using:
- Base model: answerdotai/ModernBERT-base
- Framework: Transformers/PyTorch
- Task: Binary sequence classification

## Performance

Add your model's performance metrics here when available:
- Accuracy: 0.9658 ± 0.0153
- Precision: 0.9704 ± 0.0253  
- Recall: 0.9582 ± 0.0147
- F1-Score: 0.9579 ± 0.0167
- FPR: 0.0267 ± 0.0233
- TPR: 0.9582 ± 0.0147
- Query Time 0.1226 ± 0.0253  in CPU do not need GPU

## Use Cases

- **Cybersecurity**: Detect malicious domains generated by malware
- **Network Security**: Filter potentially harmful domains
- **Threat Intelligence**: Analyze domain patterns in security feeds

## Limitations

- This model is trained specifically for domain classification
- Performance may vary on domains from different TLDs or languages
- Regular retraining may be needed as DGA techniques evolve
- Model performance depends on the quality and diversity of training data

## Citation

If you use this model in your research or applications, please cite it appropriately.

## Related Models

Check out the author's other security models:
- [Llama3_8B-DGA-Detector](https://huggingface.co/Reynier/Llama3_8B-DGA-Detector)