--- language: vi tags: - hate-speech-detection - vietnamese - transformer license: apache-2.0 datasets: - VN-HSD metrics: - accuracy - f1 model-index: - name: visobert-hsd results: - task: type: text-classification name: Hate Speech Detection dataset: name: VN-HSD type: custom metrics: - name: Accuracy type: accuracy value: - name: F1 Score type: f1 value: base_model: - uitnlp/visobert # replace with actual ViSoBERT Hub name pipeline_tag: text-classification --- # ViSoBERT‑HSD: Hate Speech Detection for Vietnamese Text Fine‑tuned from [`uitnlp/visobert`](https://huggingface.co/uitnlp/visobert) on the **VN‑HSD** unified Vietnamese hate‐speech dataset, combining ViHSD, ViCTSD, and ViHOS. ## Model Details * **Base Model**: [`uitnlp/visobert`](https://huggingface.co/uitnlp/visobert) * **Dataset**: VN‑HSD (ViSoLex‑HSD unified hate speech corpus) * **Fine‑tuning**: HuggingFace Transformers ### Hyperparameters * Batch size: `32` * Learning rate: `3e-5` * Epochs: `100` * Max sequence length: `256` ## Results * **Accuracy**: `` * **F1 Score**: `` ## Usage ```python from transformers import AutoTokenizer, AutoModelForSequenceClassification tokenizer = AutoTokenizer.from_pretrained("visolex/visobert-hsd") model = AutoModelForSequenceClassification.from_pretrained("visolex/visobert-hsd") text = "Hắn ta thật kinh tởm!" inputs = tokenizer(text, return_tensors="pt", truncation=True, max_length=256) logits = model(**inputs).logits pred = logits.argmax(dim=-1).item() label_map = {0: "CLEAN", 1: "OFFENSIVE", 2: "HATE"} print(f"Predicted label: {label_map[pred]}") ```