SetFit Multilingual E5 NSFW Classifier
This is a SetFit model trained for binary classification of adult/safe content in articles.
Model Details
- Base Model: intfloat/multilingual-e5-base
- Model Type: SetFit (Sentence Transformer Fine-tuning)
- Task: Binary Text Classification (Adult Content Detection)
- Labels:
adult,safe
Training Details
Training Data
- Dataset: NSFW Combined Dataset v1
- Total Examples: 1262
- Training Set: 883 (70.0%)
- Validation Set: 126 (10.0%)
- Test Set: 253 (20.0%)
Label Distribution
label adult 663 safe 599
Training Hyperparameters
- Batch Size: 64
- Number of Epochs: 3
- Number of Iterations: 5
- Random Seed: 42
Training Date
2025-11-19 17:14:54
Evaluation Results
- Validation Accuracy: 1.0000
- Test Accuracy: 1.0000
Usage
Using SetFit
from setfit import SetFitModel
# Load the model
model = SetFitModel.from_pretrained("setfit-multilingual-e5-nsfw-classifier")
# Run inference
texts = [
"domain:example.com\ntitle:Weather forecast\ndescription:Sunny skies expected",
"domain:news.com\ntitle:Breaking news\ndescription:Important developments",
]
predictions = model.predict(texts)
print(predictions)
Input Format
The model expects input text in the following format:
domain:<domain_name>
title:<article_title>
description:<article_description>
Output
The model outputs one of two labels:
adult: Content is flagged as adult/NSFWsafe: Content is safe for work
Limitations and Bias
This model is trained on a specific dataset and may not generalize well to:
- Different languages or domains not represented in training data
- Content with ambiguous or context-dependent adult themes
- Very short or very long text inputs
Citation
If you use this model, please cite SetFit:
@article{https://doi.org/10.48550/arxiv.2209.11055,
doi = {10.48550/ARXIV.2209.11055},
url = {https://arxiv.org/abs/2209.11055},
author = {Tunstall, Lewis and Reimers, Nils and Jo, Unso Eun Seo and Bates, Luke and Korat, Daniel and Wasserblat, Moshe and Pereg, Oren},
title = {Efficient Few-Shot Learning Without Prompts},
publisher = {arXiv},
year = {2022},
copyright = {Creative Commons Attribution 4.0 International}
}
- Downloads last month
- 17
Evaluation results
- Test Accuracy on NSFW Combined Dataset v1self-reported1.000
- Validation Accuracy on NSFW Combined Dataset v1self-reported1.000