The point of this model was too see whether it was possible to train a multilingual model with very little data. The quality of the results varies from language to language. Spanish is fluent, Finnish and Swedish are accented but okay, Turkish is not great and French is a total mess. The state of Estonian and Hungarian is unknown.

Training was conducted using a subset of 3000 samples from each language in the Common Voice dataset.

Training Configuration

                    - --learning_rate
                    - "0.0001"
                    - --batch_size_per_gpu
                    - "2000"
                    - --batch_size_type
                    - frame
                    - --max_samples
                    - "96"
                    - --grad_accumulation_steps
                    - "16"
                    - --max_grad_norm
                    - "0.3"
                    - --epochs
                    - "200"
                    - --num_warmup_updates
                    - "5000"
                    - --save_per_updates
                    - "10000"
                    - --keep_last_n_checkpoints
                    - "-1"
                    - --last_per_updates
                    - "5000"
                    - --tokenizer
                    - custom
                    - --bnb_optimizer

Inference Parameters

{dim=1024, depth=22, heads=16, ff_mult=2, text_dim=512, conv_layers=4}

Thanks

Thanks to Amos Wallgren, Calvin Guillot and Begüm Çelik for quality assurance.

Downloads last month: -; Downloads are not tracked for this model. How to track

Model tree for EkhoCollective/f5-tts-fi-sv-es-fr-tr-hu-et

Base model

SWivid/F5-TTS

Finetuned

(69)

this model

EkhoCollective
/

f5-tts-fi-sv-es-fr-tr-hu-et

Training Configuration

Inference Parameters

Thanks

Model tree for EkhoCollective/f5-tts-fi-sv-es-fr-tr-hu-et

Dataset used to train EkhoCollective/f5-tts-fi-sv-es-fr-tr-hu-et