|
|
--- |
|
|
language: vi |
|
|
tags: |
|
|
- hate-speech-detection |
|
|
- vietnamese |
|
|
- transformer |
|
|
license: apache-2.0 |
|
|
datasets: |
|
|
- VN-HSD |
|
|
metrics: |
|
|
- accuracy |
|
|
- f1 |
|
|
model-index: |
|
|
- name: visobert-hsd |
|
|
results: |
|
|
- task: |
|
|
type: text-classification |
|
|
name: Hate Speech Detection |
|
|
dataset: |
|
|
name: VN-HSD |
|
|
type: custom |
|
|
metrics: |
|
|
- name: Accuracy |
|
|
type: accuracy |
|
|
value: <INSERT_ACCURACY> |
|
|
- name: F1 Score |
|
|
type: f1 |
|
|
value: <INSERT_F1_SCORE> |
|
|
base_model: |
|
|
- uitnlp/visobert |
|
|
pipeline_tag: text-classification |
|
|
--- |
|
|
# ViSoBERT‑HSD: Hate Speech Detection for Vietnamese Text |
|
|
|
|
|
Fine‑tuned from [`uitnlp/visobert`](https://huggingface.co/uitnlp/visobert) on the **VN‑HSD** unified Vietnamese hate‐speech dataset, combining ViHSD, ViCTSD, and ViHOS. |
|
|
|
|
|
## Model Details |
|
|
|
|
|
* **Base Model**: [`uitnlp/visobert`](https://huggingface.co/uitnlp/visobert) |
|
|
* **Dataset**: VN‑HSD (ViSoLex‑HSD unified hate speech corpus) |
|
|
* **Fine‑tuning**: HuggingFace Transformers |
|
|
|
|
|
### Hyperparameters |
|
|
|
|
|
* Batch size: `32` |
|
|
* Learning rate: `3e-5` |
|
|
* Epochs: `100` |
|
|
* Max sequence length: `256` |
|
|
|
|
|
## Results |
|
|
|
|
|
* **Accuracy**: `<INSERT_ACCURACY>` |
|
|
* **F1 Score**: `<INSERT_F1_SCORE>` |
|
|
|
|
|
## Usage |
|
|
|
|
|
```python |
|
|
from transformers import AutoTokenizer, AutoModelForSequenceClassification |
|
|
|
|
|
tokenizer = AutoTokenizer.from_pretrained("visolex/visobert-hsd") |
|
|
model = AutoModelForSequenceClassification.from_pretrained("visolex/visobert-hsd") |
|
|
|
|
|
text = "Hắn ta thật kinh tởm!" |
|
|
inputs = tokenizer(text, return_tensors="pt", truncation=True, max_length=256) |
|
|
logits = model(**inputs).logits |
|
|
pred = logits.argmax(dim=-1).item() |
|
|
label_map = {0: "CLEAN", 1: "OFFENSIVE", 2: "HATE"} |
|
|
print(f"Predicted label: {label_map[pred]}") |
|
|
``` |
|
|
|