SetFit Multilingual E5 NSFW Classifier

This is a SetFit model trained for binary classification of adult/safe content in articles.

Model Details

  • Base Model: intfloat/multilingual-e5-base
  • Model Type: SetFit (Sentence Transformer Fine-tuning)
  • Task: Binary Text Classification (Adult Content Detection)
  • Labels: adult, safe

Training Details

Training Data

  • Dataset: NSFW Combined Dataset v1
  • Total Examples: 1262
  • Training Set: 883 (70.0%)
  • Validation Set: 126 (10.0%)
  • Test Set: 253 (20.0%)

Label Distribution

label adult 663 safe 599

Training Hyperparameters

  • Batch Size: 64
  • Number of Epochs: 3
  • Number of Iterations: 5
  • Random Seed: 42

Training Date

2025-11-19 17:14:54

Evaluation Results

  • Validation Accuracy: 1.0000
  • Test Accuracy: 1.0000

Usage

Using SetFit

from setfit import SetFitModel

# Load the model
model = SetFitModel.from_pretrained("setfit-multilingual-e5-nsfw-classifier")

# Run inference
texts = [
    "domain:example.com\ntitle:Weather forecast\ndescription:Sunny skies expected",
    "domain:news.com\ntitle:Breaking news\ndescription:Important developments",
]
predictions = model.predict(texts)
print(predictions)

Input Format

The model expects input text in the following format:

domain:<domain_name>
title:<article_title>
description:<article_description>

Output

The model outputs one of two labels:

  • adult: Content is flagged as adult/NSFW
  • safe: Content is safe for work

Limitations and Bias

This model is trained on a specific dataset and may not generalize well to:

  • Different languages or domains not represented in training data
  • Content with ambiguous or context-dependent adult themes
  • Very short or very long text inputs

Citation

If you use this model, please cite SetFit:

@article{https://doi.org/10.48550/arxiv.2209.11055,
  doi = {10.48550/ARXIV.2209.11055},
  url = {https://arxiv.org/abs/2209.11055},
  author = {Tunstall, Lewis and Reimers, Nils and Jo, Unso Eun Seo and Bates, Luke and Korat, Daniel and Wasserblat, Moshe and Pereg, Oren},
  title = {Efficient Few-Shot Learning Without Prompts},
  publisher = {arXiv},
  year = {2022},
  copyright = {Creative Commons Attribution 4.0 International}
}
Downloads last month
17
Safetensors
Model size
0.3B params
Tensor type
F32
ยท
Inference Providers NEW
This model isn't deployed by any Inference Provider. ๐Ÿ™‹ Ask for provider support

Evaluation results