SetFit Multilingual E5 NSFW Classifier

This is a SetFit model trained for binary classification of adult/safe content in articles.

Model Details

Base Model: intfloat/multilingual-e5-base
Model Type: SetFit (Sentence Transformer Fine-tuning)
Task: Binary Text Classification (Adult Content Detection)
Labels: adult, safe

Training Details

Training Data

Dataset: NSFW Combined Dataset v1
Total Examples: 1262
Training Set: 883 (70.0%)
Validation Set: 126 (10.0%)
Test Set: 253 (20.0%)

Label Distribution

label adult 663 safe 599

Training Hyperparameters

Batch Size: 64
Number of Epochs: 3
Number of Iterations: 5
Random Seed: 42

Training Date

2025-11-19 17:14:54

Evaluation Results

Validation Accuracy: 1.0000
Test Accuracy: 1.0000

Usage

Using SetFit

from setfit import SetFitModel

# Load the model
model = SetFitModel.from_pretrained("setfit-multilingual-e5-nsfw-classifier")

# Run inference
texts = [
    "domain:example.com\ntitle:Weather forecast\ndescription:Sunny skies expected",
    "domain:news.com\ntitle:Breaking news\ndescription:Important developments",
]
predictions = model.predict(texts)
print(predictions)

Input Format

The model expects input text in the following format:

domain:<domain_name>
title:<article_title>
description:<article_description>

Output

The model outputs one of two labels:

adult: Content is flagged as adult/NSFW
safe: Content is safe for work

Limitations and Bias

This model is trained on a specific dataset and may not generalize well to:

Different languages or domains not represented in training data
Content with ambiguous or context-dependent adult themes
Very short or very long text inputs

Citation

If you use this model, please cite SetFit:

@article{https://doi.org/10.48550/arxiv.2209.11055,
  doi = {10.48550/ARXIV.2209.11055},
  url = {https://arxiv.org/abs/2209.11055},
  author = {Tunstall, Lewis and Reimers, Nils and Jo, Unso Eun Seo and Bates, Luke and Korat, Daniel and Wasserblat, Moshe and Pereg, Oren},
  title = {Efficient Few-Shot Learning Without Prompts},
  publisher = {arXiv},
  year = {2022},
  copyright = {Creative Commons Attribution 4.0 International}
}

Downloads last month: 17

Safetensors

Model size

0.3B params

Tensor type

F32

Evaluation results

Test Accuracy on NSFW Combined Dataset v1
self-reported

1.000
Validation Accuracy on NSFW Combined Dataset v1
self-reported

1.000