| --- |
| license: apache-2.0 |
| datasets: |
| - gplsi/SocialTOX |
| language: |
| - es |
| metrics: |
| - accuracy |
| - f1 |
| - precision |
| - recall |
| base_model: |
| - BSC-LT/roberta-base-bne |
| pipeline_tag: text-classification |
| --- |
| |
| # 🧠 Toxicity_model_RoBERTa-base-bne– Spanish Toxicity Classifier Binary (Fine-tuned) |
|
|
| ## 📌 Model Description |
|
|
| This model is a fine-tuned version** of `RoBERTa-base-bne`, specifically trained to classify the toxicity level of **Spanish-language user comments on news articles**. It distinguishes between two categories: |
|
|
| - **Non-toxic** |
| - **Toxic** |
|
|
| --- |
|
|
| ## 📂 Training Data |
|
|
| The model was fine-tuned on the **[SocialTOX dataset](https://huggingface.co/datasets/gplsi/SocialTOX)**, a collection of Spanish-language comments annotated for varying levels of toxicity. These comments come from news platforms and represent real-world scenarios of online discourse. In this case, a Binary classifier was developed, where the classes \textit{Slightly toxic} and \textit{Toxic} were merged into a single \textit{Toxic} category. |
|
|
| --- |
|
|
| ## Training hyperparameters |
| - epochs: 10 |
| - learning_rate: 2.45e-6 |
| - beta1: 0.9 |
| - beta2: 0.95 |
| - Adam_epsilon: 1.00e-8 |
| - weight_decay: 0 |
| - batch_size: 16 |
| - max_seq_length: 512 |