Fasih-2B (ูุตูŠุญ)

A 2.09 billion parameter Arabic language model trained from scratch using the nanochat framework.

Model Details

Property Value
Parameters 2,088,768,048 (2.09B)
Architecture NanoChatGPT (Transformer decoder)
Layers 24
Hidden size 1536
Attention heads 12
Vocabulary 65,536 (Arabic BPE)
Context length 2,048 tokens
Precision bfloat16
Language Arabic

Training

Pretraining

  • Dataset: AraMix (8.19B tokens, Chinchilla-optimal)
  • Steps: 7,812 steps
  • Hardware: 8x NVIDIA H20 GPUs
  • Batch size: 1,048,576 tokens

SFT (Supervised Fine-Tuning)

  • Datasets: alpaca-gpt4-arabic (50K) + CIDAR (10K) + ArabicMMLU (x3) + ArabicGSM8K (x4) + Fasih identity conversations
  • Total mixture: 90,621 rows
  • Steps: 16 (1 epoch)

Evaluation Results

Benchmark Accuracy Random Baseline
ArabicMMLU 29.91% 25%
ACVA 49.29% 50%
ArabicGSM8K 4.47% 0%
ChatCORE 0.0175 0.0

Usage

This model uses the nanochat framework. To use it:

git clone https://github.com/karpathy/nanochat
cd nanochat
# Copy model files to $NANOCHAT_BASE_DIR/chatsft_checkpoints/d24/
python -m scripts.chat_web -i sft

Chat Format

The model uses nanochat's special token format:

<|bos|><|user_start|>ู…ุฑุญุจุงุŒ ู…ู† ุฃู†ุชุŸ<|user_end|><|assistant_start|>

Limitations

  • Small model (2B params) โ€” limited knowledge and reasoning compared to larger models
  • Trained primarily on Arabic text โ€” limited multilingual capability
  • Short context (2048 tokens)
  • This is a research/educational model trained from scratch in ~12 hours

License

Apache 2.0

Downloads last month
58
Inference Providers NEW
This model isn't deployed by any Inference Provider. ๐Ÿ™‹ 1 Ask for provider support

Space using HeshamHaroon/Fasih-2B 1

Evaluation results