Update README.md
Browse files
README.md
CHANGED
|
@@ -49,6 +49,19 @@ tokenizer = RobertaTokenizer.from_pretrained('dsfsi/PuoBERTa')
|
|
| 49 |
|
| 50 |
## Downstream Performance
|
| 51 |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 52 |
### MasakhaPOS
|
| 53 |
|
| 54 |
Performance of models on the MasakhaPOS downstream task.
|
|
@@ -65,6 +78,8 @@ Performance of models on the MasakhaPOS downstream task.
|
|
| 65 |
| PuoBERTa | **83.4** |
|
| 66 |
| PuoBERTa+JW300 | 84.1 |
|
| 67 |
|
|
|
|
|
|
|
| 68 |
### MasakhaNER
|
| 69 |
|
| 70 |
Performance of models on the MasakhaNER downstream task.
|
|
@@ -80,13 +95,17 @@ Performance of models on the MasakhaNER downstream task.
|
|
| 80 |
| PuoBERTa | **78.2** |
|
| 81 |
| PuoBERTa+JW300 | 80.2 |
|
| 82 |
|
| 83 |
-
|
|
|
|
|
|
|
|
|
|
|
|
|
| 84 |
|
| 85 |
-
|
| 86 |
|
| 87 |
## Citation Information
|
| 88 |
|
| 89 |
-
Bibtex
|
| 90 |
|
| 91 |
```
|
| 92 |
@inproceedings{marivate2023puoberta,
|
|
|
|
| 49 |
|
| 50 |
## Downstream Performance
|
| 51 |
|
| 52 |
+
### Daily News Dikgang
|
| 53 |
+
|
| 54 |
+
Learn more about the dataset in the [Dataset Folder](daily-news-dikgang)
|
| 55 |
+
|
| 56 |
+
| **Model** | **5-fold Cross Validation F1** | **Test F1** |
|
| 57 |
+
|-----------------------------|--------------------------------------|-------------------|
|
| 58 |
+
| Logistic Regression + TFIDF | 60.1 | 56.2 |
|
| 59 |
+
| NCHLT TSN RoBERTa | 64.7 | 60.3 |
|
| 60 |
+
| PuoBERTa | **63.8** | **62.9** |
|
| 61 |
+
| PuoBERTaJW300 | 66.2 | 65.4 |
|
| 62 |
+
|
| 63 |
+
Downstream News Categorisation model 🤗 [https://huggingface.co/dsfsi/PuoBERTa-News](https://huggingface.co/dsfsi/PuoBERTa-News)
|
| 64 |
+
|
| 65 |
### MasakhaPOS
|
| 66 |
|
| 67 |
Performance of models on the MasakhaPOS downstream task.
|
|
|
|
| 78 |
| PuoBERTa | **83.4** |
|
| 79 |
| PuoBERTa+JW300 | 84.1 |
|
| 80 |
|
| 81 |
+
Downstream POS model 🤗 [https://huggingface.co/dsfsi/PuoBERTa-POS](https://huggingface.co/dsfsi/PuoBERTa-POS)
|
| 82 |
+
|
| 83 |
### MasakhaNER
|
| 84 |
|
| 85 |
Performance of models on the MasakhaNER downstream task.
|
|
|
|
| 95 |
| PuoBERTa | **78.2** |
|
| 96 |
| PuoBERTa+JW300 | 80.2 |
|
| 97 |
|
| 98 |
+
Downstream NER model 🤗 [https://huggingface.co/dsfsi/PuoBERTa-NER](https://huggingface.co/dsfsi/PuoBERTa-NER)
|
| 99 |
+
|
| 100 |
+
## Pre-Training Dataset
|
| 101 |
+
|
| 102 |
+
We used the PuoData dataset, a rich source of Setswana text, ensuring that our model is well-trained and culturally attuned.
|
| 103 |
|
| 104 |
+
[Github](https://github.com/dsfsi/PuoData), 🤗 [https://huggingface.co/datasets/dsfsi/PuoData](https://huggingface.co/datasets/dsfsi/PuoData)
|
| 105 |
|
| 106 |
## Citation Information
|
| 107 |
|
| 108 |
+
Bibtex Reference
|
| 109 |
|
| 110 |
```
|
| 111 |
@inproceedings{marivate2023puoberta,
|