|
|
--- |
|
|
license: apache-2.0 |
|
|
language: |
|
|
- ar |
|
|
tags: |
|
|
- audio |
|
|
- automatic-speech-recognition |
|
|
--- |
|
|
<style> |
|
|
img { |
|
|
display: inline; |
|
|
} |
|
|
</style> |
|
|
 |
|
|
| |
|
|
|[](https://github.com/linagora-labs/ASR_train_kaldi_tunisian?tab=readme-ov-file#acoustic-model-am) |
|
|
|[](https://github.com/linagora-labs/ASR_train_kaldi_tunisian) |
|
|
|
|
|
|
|
|
# LinTO ASR Arabic Tunisia v0.1 |
|
|
|
|
|
**LinTO ASR Arabic Tunisia v0.1** is an Automatic Speech Recognition (ASR) model for the Tunisian dialect, |
|
|
with some capabilities of code-switching when some French or English words are used. |
|
|
|
|
|
This repository includes two versions of the model and a Language model with ARPA format: |
|
|
- `vosk-model`: The original, comprehensive model. |
|
|
- `android-model`: A lighter version with a simplified graph, optimized for deployment on Android devices or Raspberry Pi applications. |
|
|
- `lm_TN_CS.arpa.gz`: A language model trained using SRILM on a dataset containing 4.5 million lines of text collected from various sources. |
|
|
|
|
|
## Model Overview |
|
|
|
|
|
- **Model type**: Kaldi TDNN |
|
|
- **Language(s)**: Tunisian Dialect |
|
|
- **Use cases**: Automatic Speech Recognition (ASR) |
|
|
|
|
|
### Model Performance |
|
|
|
|
|
The following table summarizes the performance of the **LinTO ASR Arabic Tunisia v0.1** model on various considered **test sets**: |
|
|
|
|
|
| Dataset | CER | WER | |
|
|
| :------- | :------- | :------- | |
|
|
| [Youtube_TNScrapped_V1](https://huggingface.co/datasets/linagora/linto-dataset-audio-ar-tn#data-table) | `25.39%` | `37.51%` | |
|
|
| [TunSwitchCS](https://huggingface.co/datasets/linagora/linto-dataset-audio-ar-tn#data-table) | `17.72%` | `20.51%` | |
|
|
| [TunSwitchTO](https://huggingface.co/datasets/linagora/linto-dataset-audio-ar-tn#data-table) | `11.13%` | `22.54%` | |
|
|
| [ApprendreLeTunisien](https://huggingface.co/datasets/linagora/linto-dataset-audio-ar-tn#data-table) | `11.81%` | `23.27%` | |
|
|
| [TARIC](https://github.com/elyadata/TARIC-SLU) | `10.60%` | `16.06%` | |
|
|
| [OneStory](https://huggingface.co/datasets/linagora/linto-dataset-audio-ar-tn#data-table)| `1.53%` | `4.47%` | |
|
|
|
|
|
### Training code |
|
|
|
|
|
The model was trained using the following GitHub repository: [ASR_train_kaldi_tunisian](https://github.com/linagora-labs/ASR_train_kaldi_tunisian) |
|
|
|
|
|
### Training datasets |
|
|
|
|
|
The model was trained using the following datasets: |
|
|
|
|
|
- **[LinTO DataSet Audio for Arabic Tunisian](https://huggingface.co/datasets/linagora/linto-dataset-audio-ar-tn):** This dataset comprises a collection of Tunisian dialect audio recordings and their annotations for Speech-to-Text (STT) tasks. The data was collected from various sources, including Hugging Face, YouTube, and websites. |
|
|
- **[LinTO DataSet Audio for Arabic Tunisian Augmented](https://huggingface.co/datasets/linagora/linto-dataset-audio-ar-tn-augmented):** This dataset is an augmented version of the LinTO DataSet Audio for Arabic Tunisian v0.1. The augmentation includes noise reduction and voice conversion. |
|
|
- **[TARIC](https://github.com/elyadata/TARIC-SLU):** This dataset consists of Tunisian Arabic speech recordings collected from train stations in Tunisia. |
|
|
|
|
|
## How to use |
|
|
|
|
|
### 1. Download the model |
|
|
|
|
|
You can download the model and its components directly from this repository using one of the following methods: |
|
|
|
|
|
**Method 1: Direct Download via Browser** |
|
|
|
|
|
1. **Visit the Repository**: Navigate to the [Hugging Face model page](https://huggingface.co/linagora/linto-asr-ar-tn-0.1). |
|
|
2. **Download as Zip**: Click on the "Download" button or the "Code" button (often appearing as a dropdown). Select "Download ZIP" to get the entire repository as a zip file. |
|
|
|
|
|
**Method 2: Using `curl` command** |
|
|
|
|
|
You can follow the command below: |
|
|
|
|
|
```bash |
|
|
sudo apt-get install curl |
|
|
|
|
|
curl -L https://huggingface.co/linagora/linto-asr-ar-tn-0.1/resolve/main/vosk-model.zip --output vosk-model.zip |
|
|
|
|
|
``` |
|
|
(or same with `android-model.zip` instead of `vosk-model.zip`) |
|
|
|
|
|
**Method 3: Cloning the Repository** |
|
|
|
|
|
You can clone the repository and create a zip file of the contents if needed: |
|
|
|
|
|
```bash |
|
|
sudo apt-get install git-lfs |
|
|
git lfs install |
|
|
|
|
|
git clone https://huggingface.co/linagora/linto-asr-ar-tn-0.1.git |
|
|
|
|
|
cd linto-asr-ar-tn-0.1 |
|
|
``` |
|
|
|
|
|
### 2. Unzip the model |
|
|
|
|
|
This can be done in bash: |
|
|
```bash |
|
|
mkdir dir_for_zip_extract |
|
|
|
|
|
unzip /path/to/model-name.zip -d dir_for_zip_extract |
|
|
``` |
|
|
|
|
|
### 3. Python code |
|
|
|
|
|
First, make sure to install the required dependencies: |
|
|
|
|
|
```bash |
|
|
pip install vosk |
|
|
``` |
|
|
|
|
|
Then you can launch the inference script from this repository: |
|
|
```bash |
|
|
python inference.py <path/to/your/model> <path/to/your/audio/file.wav> |
|
|
``` |
|
|
|
|
|
or use such a python code: |
|
|
```python |
|
|
from vosk import Model, KaldiRecognizer |
|
|
import wave |
|
|
import json |
|
|
|
|
|
model_dir = "path/to/your/model" |
|
|
audio_file = "path/to/your/audio/file.wav" |
|
|
|
|
|
model = Model(model_dir) |
|
|
|
|
|
with wave.open(audio_file, "rb") as wf: |
|
|
if wf.getnchannels() != 1 or wf.getsampwidth() != 2 or wf.getcomptype() != "NONE": |
|
|
raise ValueError("Audio file must be WAV format mono PCM.") |
|
|
|
|
|
rec = KaldiRecognizer(model, wf.getframerate()) |
|
|
rec.AcceptWaveform(wf.readframes(wf.getnframes())) |
|
|
res = rec.FinalResult() |
|
|
transcript = json.loads(res)["text"] |
|
|
print(f"Transcript: {transcript}") |
|
|
``` |
|
|
|
|
|
## Example |
|
|
|
|
|
Here is an example of the transcription capabilities of the model: |
|
|
|
|
|
<audio controls> |
|
|
<source src="https://huggingface.co/linagora/linto-asr-ar-tn-0.1/resolve/main/sample.wav" type="audio/wav"> |
|
|
</audio> |
|
|
|
|
|
### Result: |
|
|
<p dir="rtl"> |
|
|
بالدعم هاذايا لي بثتهولو ال berd يعني أحنا حتى ال projet متاعو تقلب حتى sur le plan حتى فال management يا سيد نحنا في تسيير الشريكة يعني تبدل مية و ثمانين درجة ماللي يعني قبل ما تجيه ال berd و بعد ما جاتو ال berd برنامج نخصص لل les startup إسمو |
|
|
</p> |
|
|
|
|
|
## WebRTC Demonstartion |
|
|
|
|
|
Install required dependencies: |
|
|
```bash |
|
|
pip install vosk |
|
|
pip install websockets |
|
|
``` |
|
|
|
|
|
If not done, close the repostorory: |
|
|
```bash |
|
|
git clone https://huggingface.co/linagora/linto-asr-ar-tn-0.1.git |
|
|
``` |
|
|
|
|
|
Then call the `app.py` script: |
|
|
```bash |
|
|
cd linto-asr-ar-tn-0.1/Demo-WebRTC |
|
|
|
|
|
python3 app.py <model-path> |
|
|
``` |
|
|
Access the web interface at: `localhost:8010` Just start and speak. |
|
|
|
|
|
Preview of the web app interface: |
|
|
 |
|
|
|
|
|
|
|
|
## Citation |
|
|
|
|
|
When using the LinTO ASR for Arabic Tunisian model, please cite the following paper (arxiv:2504.02604). |
|
|
```bibtex |
|
|
@misc{linagora2024Linto-tn, |
|
|
title = {LinTO Audio and Textual Datasets to Train and Evaluate Automatic Speech Recognition in Tunisian Arabic Dialect}, |
|
|
author = {Hedi Naouara and Jérôme Louradour and Jean-Pierre Lorré}, |
|
|
year = {2025}, |
|
|
month = {March}, |
|
|
eprint={2504.02604}, |
|
|
archivePrefix={arXiv}, |
|
|
primaryClass={cs.CL}, |
|
|
note={Good Data Workshop, AAAI 2025}, |
|
|
url={arxiv.org/abs/2504.02604}, |
|
|
} |
|
|
``` |