visolex
/

visobert-hsd

Text Classification

hate-speech-detection

Model card Files Files and versions

visobert-hsd / README.md

AnnyNguyen's picture

Update README.md

33908c8 verified 6 months ago

|

history blame contribute delete

1.75 kB

	---
	language: vi
	tags:
	- hate-speech-detection
	- vietnamese
	- transformer
	license: apache-2.0
	datasets:
	- VN-HSD
	metrics:
	- accuracy
	- f1
	model-index:
	- name: visobert-hsd
	results:
	- task:
	type: text-classification
	name: Hate Speech Detection
	dataset:
	name: VN-HSD
	type: custom
	metrics:
	- name: Accuracy
	type: accuracy
	value: <INSERT_ACCURACY>
	- name: F1 Score
	type: f1
	value: <INSERT_F1_SCORE>
	base_model:
	- uitnlp/visobert # replace with actual ViSoBERT Hub name
	pipeline_tag: text-classification
	---
	# ViSoBERT‑HSD: Hate Speech Detection for Vietnamese Text

	Fine‑tuned from [`uitnlp/visobert`](https://huggingface.co/uitnlp/visobert) on the VN‑HSD unified Vietnamese hate‐speech dataset, combining ViHSD, ViCTSD, and ViHOS.

	## Model Details

	* Base Model: [`uitnlp/visobert`](https://huggingface.co/uitnlp/visobert)
	* Dataset: VN‑HSD (ViSoLex‑HSD unified hate speech corpus)
	* Fine‑tuning: HuggingFace Transformers

	### Hyperparameters

	* Batch size: `32`
	* Learning rate: `3e-5`
	* Epochs: `100`
	* Max sequence length: `256`

	## Results

	* Accuracy: `<INSERT_ACCURACY>`
	* F1 Score: `<INSERT_F1_SCORE>`

	## Usage

	```python
	from transformers import AutoTokenizer, AutoModelForSequenceClassification

	tokenizer = AutoTokenizer.from_pretrained("visolex/visobert-hsd")
	model = AutoModelForSequenceClassification.from_pretrained("visolex/visobert-hsd")

	text = "Hắn ta thật kinh tởm!"
	inputs = tokenizer(text, return_tensors="pt", truncation=True, max_length=256)
	logits = model(**inputs).logits
	pred = logits.argmax(dim=-1).item()
	label_map = {0: "CLEAN", 1: "OFFENSIVE", 2: "HATE"}
	print(f"Predicted label: {label_map[pred]}")
	```