Commit
·
fbecdcb
1
Parent(s):
ded0380
update model
Browse files- README.md +21 -13
- config.json +1 -1
- pytorch_model.bin +2 -2
- vocab.json +1 -1
README.md
CHANGED
|
@@ -2,6 +2,7 @@
|
|
| 2 |
language: ar
|
| 3 |
datasets:
|
| 4 |
- common_voice
|
|
|
|
| 5 |
metrics:
|
| 6 |
- wer
|
| 7 |
- cer
|
|
@@ -24,15 +25,15 @@ model-index:
|
|
| 24 |
metrics:
|
| 25 |
- name: Test WER
|
| 26 |
type: wer
|
| 27 |
-
value:
|
| 28 |
- name: Test CER
|
| 29 |
type: cer
|
| 30 |
-
value: 18.
|
| 31 |
---
|
| 32 |
|
| 33 |
# Wav2Vec2-Large-XLSR-53-Arabic
|
| 34 |
|
| 35 |
-
Fine-tuned [facebook/wav2vec2-large-xlsr-53](https://huggingface.co/facebook/wav2vec2-large-xlsr-53) on Arabic using the [Common Voice](https://huggingface.co/datasets/common_voice).
|
| 36 |
When using this model, make sure that your speech input is sampled at 16kHz.
|
| 37 |
|
| 38 |
The script used for training can be found here: https://github.com/jonatasgrosman/wav2vec2-sprint
|
|
@@ -49,7 +50,7 @@ from transformers import Wav2Vec2ForCTC, Wav2Vec2Processor
|
|
| 49 |
|
| 50 |
LANG_ID = "ar"
|
| 51 |
MODEL_ID = "jonatasgrosman/wav2vec2-large-xlsr-53-arabic"
|
| 52 |
-
SAMPLES =
|
| 53 |
|
| 54 |
test_dataset = load_dataset("common_voice", LANG_ID, split=f"test[:{SAMPLES}]")
|
| 55 |
|
|
@@ -81,11 +82,16 @@ for i, predicted_sentence in enumerate(predicted_sentences):
|
|
| 81 |
|
| 82 |
| Reference | Prediction |
|
| 83 |
| ------------- | ------------- |
|
| 84 |
-
| ألديك قلم ؟ |
|
| 85 |
-
| ليست هناك مسافة على هذه الأرض أبعد من يوم أمس. | ليست
|
| 86 |
-
| إنك تكبر المشكلة. | إنك تكبر المشكلة
|
| 87 |
-
| يرغب أن يلتقي بك. | يرغب أن يلتقي بك
|
| 88 |
| إنهم لا يعرفون لماذا حتى. | إنهم لا يعرفون لماذا حتى |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 89 |
|
| 90 |
## Evaluation
|
| 91 |
|
|
@@ -102,9 +108,11 @@ LANG_ID = "ar"
|
|
| 102 |
MODEL_ID = "jonatasgrosman/wav2vec2-large-xlsr-53-arabic"
|
| 103 |
DEVICE = "cuda"
|
| 104 |
|
| 105 |
-
CHARS_TO_IGNORE = [",", "?", "¿", ".", "!", "¡", ";", ":", '""', "%", '"', "�", "ʿ", "·", "჻", "~", "՞",
|
| 106 |
-
|
| 107 |
-
|
|
|
|
|
|
|
| 108 |
|
| 109 |
test_dataset = load_dataset("common_voice", LANG_ID, split="test")
|
| 110 |
|
|
@@ -152,11 +160,11 @@ print(f"CER: {cer.compute(predictions=predictions, references=references, chunk_
|
|
| 152 |
|
| 153 |
**Test Result**:
|
| 154 |
|
| 155 |
-
In the table below I report the Word Error Rate (WER) and the Character Error Rate (CER) of the model. I ran the evaluation script described above on other models as well (on 2021-
|
| 156 |
|
| 157 |
| Model | WER | CER |
|
| 158 |
| ------------- | ------------- | ------------- |
|
| 159 |
-
| jonatasgrosman/wav2vec2-large-xlsr-53-arabic | **
|
| 160 |
| bakrianoo/sinai-voice-ar-stt | 45.30% | 21.84% |
|
| 161 |
| othrif/wav2vec2-large-xlsr-arabic | 45.93% | 20.51% |
|
| 162 |
| kmfoda/wav2vec2-large-xlsr-arabic | 54.14% | 26.07% |
|
|
|
|
| 2 |
language: ar
|
| 3 |
datasets:
|
| 4 |
- common_voice
|
| 5 |
+
- arabic_speech_corpus
|
| 6 |
metrics:
|
| 7 |
- wer
|
| 8 |
- cer
|
|
|
|
| 25 |
metrics:
|
| 26 |
- name: Test WER
|
| 27 |
type: wer
|
| 28 |
+
value: 39.59
|
| 29 |
- name: Test CER
|
| 30 |
type: cer
|
| 31 |
+
value: 18.18
|
| 32 |
---
|
| 33 |
|
| 34 |
# Wav2Vec2-Large-XLSR-53-Arabic
|
| 35 |
|
| 36 |
+
Fine-tuned [facebook/wav2vec2-large-xlsr-53](https://huggingface.co/facebook/wav2vec2-large-xlsr-53) on Arabic using the [Common Voice](https://huggingface.co/datasets/common_voice) and [Arabic Speech Corpus](https://huggingface.co/datasets/arabic_speech_corpus).
|
| 37 |
When using this model, make sure that your speech input is sampled at 16kHz.
|
| 38 |
|
| 39 |
The script used for training can be found here: https://github.com/jonatasgrosman/wav2vec2-sprint
|
|
|
|
| 50 |
|
| 51 |
LANG_ID = "ar"
|
| 52 |
MODEL_ID = "jonatasgrosman/wav2vec2-large-xlsr-53-arabic"
|
| 53 |
+
SAMPLES = 10
|
| 54 |
|
| 55 |
test_dataset = load_dataset("common_voice", LANG_ID, split=f"test[:{SAMPLES}]")
|
| 56 |
|
|
|
|
| 82 |
|
| 83 |
| Reference | Prediction |
|
| 84 |
| ------------- | ------------- |
|
| 85 |
+
| ألديك قلم ؟ | ألديك قلم |
|
| 86 |
+
| ليست هناك مسافة على هذه الأرض أبعد من يوم أمس. | ليست نالك مسافة على هذه الأرض أبعد من يوم الأمس م |
|
| 87 |
+
| إنك تكبر المشكلة. | إنك تكبر المشكلة |
|
| 88 |
+
| يرغب أن يلتقي بك. | يرغب أن يلتقي بك |
|
| 89 |
| إنهم لا يعرفون لماذا حتى. | إنهم لا يعرفون لماذا حتى |
|
| 90 |
+
| سيسعدني مساعدتك أي وقت تحب. | سيسئدنيمساعدتك أي وقد تحب |
|
| 91 |
+
| أَحَبُّ نظريّة علمية إليّ هي أن حلقات زحل مكونة بالكامل من الأمتعة المفقودة. | أحب نظرية علمية إلي هي أن حل قتزح المكوينا بالكامل من الأمت عن المفقودة |
|
| 92 |
+
| سأشتري له قلماً. | سأشتري له قلما |
|
| 93 |
+
| أين المشكلة ؟ | أين المشكل |
|
| 94 |
+
| وَلِلَّهِ يَسْجُدُ مَا فِي السَّمَاوَاتِ وَمَا فِي الْأَرْضِ مِنْ دَابَّةٍ وَالْمَلَائِكَةُ وَهُمْ لَا يَسْتَكْبِرُونَ | ولله يسجد ما في السماوات وما في الأرض من دابة والملائكة وهم لا يستكبرون |
|
| 95 |
|
| 96 |
## Evaluation
|
| 97 |
|
|
|
|
| 108 |
MODEL_ID = "jonatasgrosman/wav2vec2-large-xlsr-53-arabic"
|
| 109 |
DEVICE = "cuda"
|
| 110 |
|
| 111 |
+
CHARS_TO_IGNORE = [",", "?", "¿", ".", "!", "¡", ";", ";", ":", '""', "%", '"', "�", "ʿ", "·", "჻", "~", "՞",
|
| 112 |
+
"؟", "،", "।", "॥", "«", "»", "„", "“", "”", "「", "」", "‘", "’", "《", "》", "(", ")", "[", "]",
|
| 113 |
+
"{", "}", "=", "`", "_", "+", "<", ">", "…", "–", "°", "´", "ʾ", "‹", "›", "©", "®", "—", "→", "。",
|
| 114 |
+
"、", "﹂", "﹁", "‧", "~", "﹏", ",", "{", "}", "(", ")", "[", "]", "【", "】", "‥", "〽",
|
| 115 |
+
"『", "』", "〝", "〟", "⟨", "⟩", "〜", ":", "!", "?", "♪", "؛", "/", "\\", "º", "−", "^", "'", "ʻ", "ˆ"]
|
| 116 |
|
| 117 |
test_dataset = load_dataset("common_voice", LANG_ID, split="test")
|
| 118 |
|
|
|
|
| 160 |
|
| 161 |
**Test Result**:
|
| 162 |
|
| 163 |
+
In the table below I report the Word Error Rate (WER) and the Character Error Rate (CER) of the model. I ran the evaluation script described above on other models as well (on 2021-05-14). Note that the table below may show different results from those already reported, this may have been caused due to some specificity of the other evaluation scripts used.
|
| 164 |
|
| 165 |
| Model | WER | CER |
|
| 166 |
| ------------- | ------------- | ------------- |
|
| 167 |
+
| jonatasgrosman/wav2vec2-large-xlsr-53-arabic | **39.59%** | **18.18%** |
|
| 168 |
| bakrianoo/sinai-voice-ar-stt | 45.30% | 21.84% |
|
| 169 |
| othrif/wav2vec2-large-xlsr-arabic | 45.93% | 20.51% |
|
| 170 |
| kmfoda/wav2vec2-large-xlsr-arabic | 54.14% | 26.07% |
|
config.json
CHANGED
|
@@ -72,5 +72,5 @@
|
|
| 72 |
"num_hidden_layers": 24,
|
| 73 |
"pad_token_id": 0,
|
| 74 |
"transformers_version": "4.5.0.dev0",
|
| 75 |
-
"vocab_size":
|
| 76 |
}
|
|
|
|
| 72 |
"num_hidden_layers": 24,
|
| 73 |
"pad_token_id": 0,
|
| 74 |
"transformers_version": "4.5.0.dev0",
|
| 75 |
+
"vocab_size": 51
|
| 76 |
}
|
pytorch_model.bin
CHANGED
|
@@ -1,3 +1,3 @@
|
|
| 1 |
version https://git-lfs.github.com/spec/v1
|
| 2 |
-
oid sha256:
|
| 3 |
-
size
|
|
|
|
| 1 |
version https://git-lfs.github.com/spec/v1
|
| 2 |
+
oid sha256:a0b26f6d9d3edfde1784aef863c192a8cc1e438a23b45910ab648531ebe1857b
|
| 3 |
+
size 1262142936
|
vocab.json
CHANGED
|
@@ -1 +1 @@
|
|
| 1 |
-
{"<pad>": 0, "<s>": 1, "</s>": 2, "<unk>": 3, "|": 4, "
|
|
|
|
| 1 |
+
{"<pad>": 0, "<s>": 1, "</s>": 2, "<unk>": 3, "|": 4, "-": 5, "ء": 6, "آ": 7, "أ": 8, "ؤ": 9, "إ": 10, "ئ": 11, "ا": 12, "ب": 13, "ة": 14, "ت": 15, "ث": 16, "ج": 17, "ح": 18, "خ": 19, "د": 20, "ذ": 21, "ر": 22, "ز": 23, "س": 24, "ش": 25, "ص": 26, "ض": 27, "ط": 28, "ظ": 29, "ع": 30, "غ": 31, "ـ": 32, "ف": 33, "ق": 34, "ك": 35, "ل": 36, "م": 37, "ن": 38, "ه": 39, "و": 40, "ى": 41, "ي": 42, "ً": 43, "ٌ": 44, "ٍ": 45, "َ": 46, "ُ": 47, "ِ": 48, "ّ": 49, "ْ": 50}
|