Reynier's picture
Update README.md
4704d21 verified
metadata
license: apache-2.0
tags:
  - domain-generation-algorithm
  - cybersecurity
  - domain-classification
  - security
  - malware-detection
language:
  - en
library_name: transformers
pipeline_tag: text-classification
base_model: answerdotai/ModernBERT-base

ModernBERT DGA Detector

This model is designed to classify domains as either legitimate or generated by Domain Generation Algorithms (DGA).

Model Description

  • Model Type: BERT-based sequence classification
  • Task: Binary classification (Legitimate vs DGA domains)
  • Base Model: ModernBERT-base
  • Training Data: Domain names dataset
  • Author: Reynier Leyva La O, Carlos A. Catania

Usage

from transformers import AutoTokenizer, AutoModelForSequenceClassification
import torch

# Load model and tokenizer
tokenizer = AutoTokenizer.from_pretrained("Reynier/modernbert-dga-detector")
model = AutoModelForSequenceClassification.from_pretrained("Reynier/modernbert-dga-detector")

# Example prediction
def predict_domain(domain):
    inputs = tokenizer(domain, return_tensors="pt", max_length=64, truncation=True, padding=True)
    with torch.no_grad():
        outputs = model(**inputs)
        predictions = torch.softmax(outputs.logits, dim=-1)
        legit_prob = predictions[0][0].item()
        dga_prob = predictions[0][1].item()
    return {"prediction": "DGA" if dga_prob > legit_prob else "LEGITIMATE", 
             "confidence": max(legit_prob, dga_prob)}

# Test examples
domains = ["google.com", "xkvbzpqr.net", "facebook.com", "abcdef123456.com"]
for domain in domains:
    result = predict_domain(domain)
    print(f"{domain} -> {result['prediction']} (confidence: {result['confidence']:.3f})")

Model Architecture

The model is based on ModernBERT and fine-tuned for domain classification:

  • Input: Domain names (text)
  • Output: Binary classification (0=LEGITIMATE, 1=DGA)
  • Max sequence length: 64 tokens

Training Details

This model was fine-tuned on a dataset of legitimate and DGA-generated domains using:

  • Base model: answerdotai/ModernBERT-base
  • Framework: Transformers/PyTorch
  • Task: Binary sequence classification

Performance

Add your model's performance metrics here when available:

  • Accuracy: 0.9658 ± 0.0153
  • Precision: 0.9704 ± 0.0253
  • Recall: 0.9582 ± 0.0147
  • F1-Score: 0.9579 ± 0.0167
  • FPR: 0.0267 ± 0.0233
  • TPR: 0.9582 ± 0.0147
  • Query Time 0.1226 ± 0.0253 in CPU do not need GPU

Use Cases

  • Cybersecurity: Detect malicious domains generated by malware
  • Network Security: Filter potentially harmful domains
  • Threat Intelligence: Analyze domain patterns in security feeds

Limitations

  • This model is trained specifically for domain classification
  • Performance may vary on domains from different TLDs or languages
  • Regular retraining may be needed as DGA techniques evolve
  • Model performance depends on the quality and diversity of training data

Citation

If you use this model in your research or applications, please cite it appropriately.

Related Models

Check out the author's other security models: