visolex
/

sphobert-hsd-span

Token Classification

Model card Files Files and versions

sphobert-hsd-span / README.md

AnnyNguyen's picture

Update README.md

0a9c6e2 verified 6 days ago

|

history blame contribute delete

1.89 kB

	---
	license: apache-2.0
	base_model: sphobert
	tags:
	- vietnamese
	- hate-speech
	- span-detection
	- token-classification
	- nlp
	datasets:
	- visolex/ViHOS
	model-index:
	- name: sphobert-hsd-span
	results:
	- task:
	type: token-classification
	name: Hate Speech Span Detection
	dataset:
	name: visolex/ViHOS
	type: visolex/ViHOS
	metrics:
	- type: f1
	value: 0.5800
	- type: precision
	value: 0.6934
	- type: recall
	value: 0.5990
	- type: exact_match
	value: 0.0995
	---

	# sphobert-hsd-span: Hate Speech Span Detection (Vietnamese)

	This model is a fine-tuned version of [sphobert](https://huggingface.co/sphobert) for Vietnamese Hate Speech Span Detection.

	## Model Details

	- Base Model: `sphobert`
	- Description: Vietnamese Hate Speech Span Detection
	- Framework: HuggingFace Transformers
	- Task: Hate Speech Span Detection (token/char-level spans)

	### Hyperparameters

	- Max sequence length: `64`
	- Learning rate: `5e-6`
	- Batch size: `32`
	- Epochs: `100`
	- Early stopping patience: `5`

	## Results

	- F1: `0.5800`
	- Precision: `0.6934`
	- Recall: `0.5990`
	- Exact Match: `0.0995`

	## Usage

	```python
	from transformers import AutoTokenizer, AutoModelForTokenClassification
	import torch

	model_name = "visolex/sphobert-hsd-span"
	tok = AutoTokenizer.from_pretrained(model_name)
	model = AutoModelForTokenClassification.from_pretrained(model_name)
	text = "Ví dụ câu tiếng Việt có nội dung thù ghét ..."
	enc = tok(text, return_tensors="pt", truncation=True, max_length=256, is_split_into_words=False)
	with torch.no_grad():
	logits = model(**enc).logits
	pred_ids = logits.argmax(-1)[0].tolist()
	# TODO: chuyển pred_ids -> spans theo scheme nhãn của bạn (BIO/BILOU/char-offset)
	```

	## License

	Apache-2.0

	## Acknowledgments

	- Base model: [sphobert](https://huggingface.co/sphobert)