Fasih-2B (ูุตูุญ)
A 2.09 billion parameter Arabic language model trained from scratch using the nanochat framework.
Model Details
| Property | Value |
|---|---|
| Parameters | 2,088,768,048 (2.09B) |
| Architecture | NanoChatGPT (Transformer decoder) |
| Layers | 24 |
| Hidden size | 1536 |
| Attention heads | 12 |
| Vocabulary | 65,536 (Arabic BPE) |
| Context length | 2,048 tokens |
| Precision | bfloat16 |
| Language | Arabic |
Training
Pretraining
- Dataset: AraMix (8.19B tokens, Chinchilla-optimal)
- Steps: 7,812 steps
- Hardware: 8x NVIDIA H20 GPUs
- Batch size: 1,048,576 tokens
SFT (Supervised Fine-Tuning)
- Datasets: alpaca-gpt4-arabic (50K) + CIDAR (10K) + ArabicMMLU (x3) + ArabicGSM8K (x4) + Fasih identity conversations
- Total mixture: 90,621 rows
- Steps: 16 (1 epoch)
Evaluation Results
| Benchmark | Accuracy | Random Baseline |
|---|---|---|
| ArabicMMLU | 29.91% | 25% |
| ACVA | 49.29% | 50% |
| ArabicGSM8K | 4.47% | 0% |
| ChatCORE | 0.0175 | 0.0 |
Usage
This model uses the nanochat framework. To use it:
git clone https://github.com/karpathy/nanochat
cd nanochat
# Copy model files to $NANOCHAT_BASE_DIR/chatsft_checkpoints/d24/
python -m scripts.chat_web -i sft
Chat Format
The model uses nanochat's special token format:
<|bos|><|user_start|>ู
ุฑุญุจุงุ ู
ู ุฃูุชุ<|user_end|><|assistant_start|>
Limitations
- Small model (2B params) โ limited knowledge and reasoning compared to larger models
- Trained primarily on Arabic text โ limited multilingual capability
- Short context (2048 tokens)
- This is a research/educational model trained from scratch in ~12 hours
License
Apache 2.0
- Downloads last month
- 58
Space using HeshamHaroon/Fasih-2B 1
Evaluation results
- ArabicMMLU Accuracyself-reported29.910
- ACVA Accuracyself-reported49.290
- ArabicGSM8K Accuracyself-reported4.470