File size: 16,492 Bytes
861a266 be73b15 861a266 2b3ee70 53c3c73 65f8795 53c3c73 d3c4bfa 9c68ef7 d3c4bfa 9c68ef7 d3c4bfa 53c3c73 2fb0dab be73b15 2b3ee70 be73b15 53c3c73 ee94a8c 53c3c73 be73b15 53c3c73 2eae6d7 53c3c73 31db0f5 2b3ee70 31db0f5 be73b15 730abb3 be73b15 730abb3 be73b15 730abb3 be73b15 730abb3 be73b15 730abb3 be73b15 31db0f5 2b3ee70 31db0f5 53c3c73 be73b15 53c3c73 be73b15 53c3c73 31db0f5 53c3c73 be73b15 53c3c73 31db0f5 53c3c73 01a69a6 bae016e 2248e3c bae016e 2b3ee70 53c3c73 f3dd06b 53c3c73 f3dd06b 53c3c73 2b3ee70 b795416 53c3c73 bae016e 53c3c73 d301133 53c3c73 80567f0 53c3c73 2b3ee70 53c3c73 bae016e 53c3c73 d3cc664 31db0f5 53c3c73 df125e5 53c3c73 494b673 53c3c73 494b673 53c3c73 be73b15 53c3c73 31db0f5 53c3c73 e687c70 2b3ee70 831a704 b93a5f6 831a704 64a951c 831a704 64a951c 831a704 b93a5f6 831a704 2b3ee70 53c3c73 31db0f5 53c3c73 f3dd06b be73b15 f3dd06b be73b15 f3dd06b be73b15 f3dd06b be73b15 861a266 53c3c73 31db0f5 53c3c73 2b3ee70 53c3c73 be73b15 53c3c73 be73b15 31db0f5 53c3c73 be73b15 53c3c73 be73b15 53c3c73 ac47855 53c3c73 31db0f5 53c3c73 2b3ee70 be73b15 53c3c73 45d1b55 53c3c73 be73b15 53c3c73 be73b15 53c3c73 be73b15 e687c70 be73b15 e687c70 be73b15 bae016e 53c3c73 2b3ee70 53c3c73 |
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 163 164 165 166 167 168 169 170 171 172 173 174 175 176 177 178 179 180 181 182 183 184 185 186 187 188 189 190 191 192 193 194 195 196 197 198 199 200 201 202 203 204 205 206 207 208 209 210 211 212 213 214 215 216 217 218 219 220 221 222 223 224 225 226 227 228 229 230 231 232 233 234 235 236 237 238 239 240 241 242 243 244 245 246 247 248 249 250 251 252 253 254 255 256 257 258 259 260 261 262 263 264 265 266 267 268 269 270 271 272 273 274 275 276 277 278 279 280 281 282 283 284 285 286 287 288 289 290 291 292 293 294 295 296 297 298 299 300 301 302 303 304 305 306 307 308 309 310 311 312 313 314 315 316 317 318 319 320 321 322 323 324 325 326 327 328 329 330 331 332 333 334 335 336 337 338 339 340 341 342 343 344 345 346 347 348 349 350 351 352 353 354 355 356 357 358 359 360 361 362 363 364 365 366 367 368 369 |
---
tags:
- text-generation
- reasoning
- coding
- mathematics
- quantization
- 4-bit model
- state-of-the-art
license: apache-2.0
datasets:
- synthetic
base_model: deepseek-ai/DeepSeek-R1-Distill-Qwen-32B
language:
- en
- hi
library_name: transformers
pipeline_tag: text-generation
---
# Alpie Core: 4-bit Quantized Reasoning Model
📄 **[Technical Report: Alpie Core.pdf](./Alpie_Core.pdf)**
<p align="center">
<a href="https://169pi.ai/"><img src="https://img.shields.io/badge/🌐%20Website-169Pi%20AI-blue" alt="Website"></a>
<a href="https://huggingface.co/169Pi"><img src="https://img.shields.io/badge/🤗%20Hugging%20Face-169Pi%20AI-yellow" alt="Hugging Face"></a>
<a href="https://www.linkedin.com/company/169pi/"><img src="https://img.shields.io/badge/LinkedIn-169Pi%20AI-blue" alt="LinkedIn"></a>
<a href="https://x.com/169Pi_ai"><img src="https://img.shields.io/badge/X-169Pi%20AI-black" alt="X"></a>
</p>
## 1. Introduction
**Alpie Core is one of the first fine-tuned 4-bit reasoning models from India, and among one of the first worldwide.** Trained on just 8 Hopper GPUs using LoRA for parameter-efficient fine-tuning, combined with QLoRA 4-bit quantization, and synthetic STEM-rich dataset distillation, it proves that aggressive quantization can not only match but also surpass full-precision baselines.
With a dramatically reduced memory footprint, Alpie Core delivers competitive, frontier-level reasoning performance, even beating some top proprietary models. It achieves **81.28% on MMLU, 92.75% on GSM8K, and 57.8% on SWE-Bench Verified**, ranking top globally on competitive leaderboards, a demonstration that efficient models can rival frontier systems while remaining practical for real-world deployment at scale.

## 2. Model Summary
- **Base Architecture**: DeepSeek-R1-Distill-Qwen-32B
- **Parameters**: 32 billion (quantized to 4-bit)
- **Training Method**: Supervised Fine-Tuning (SFT) using LoRA/QLoRA techniques
- **Quantization**: 4-bit NF4 with double quantization
- **Context Length**: 65k tokens
- **Max Output Length**: 16,384 tokens
- **Training Data Sources:** Synthetic (STEM, reasoning, coding) + domain-rich curated data (law, Indian context, exams, multilingual).
- **License**: Apache 2.0
## 3. Approach
**Alpie Core** has undergone extensive **supervised fine-tuning (SFT)** to strengthen reasoning, robustness, and safety. The training leveraged a diverse mixture of curated open-source datasets and proprietary synthetic data, optimised with high-quality LLM-generated responses. The fine-tuning process emphasised adherence to rigorous safety and usability standards, including:
1.**User Understanding and Clarity** – ensuring outputs are direct, interpretable, and pedagogically sound.
2.**Security and Ethical Guidelines** – filtering unsafe or harmful generations during and after training.
3.**Limitations, Disclaimers, and Knowledge Boundaries** – transparently communicating uncertainty and scope.
4.**Handling Complex and Sensitive Topics** – balancing informativeness with responsible guardrails.
5.**Safety and Respectful Engagement** – maintaining politeness, inclusivity, and cultural sensitivity.
6.**Confidentiality and Responsible Use** – preventing leakage of private training data, proprietary prompts, or internal reasoning traces.
This SFT approach enables Alpie Core to deliver reliable, aligned, and context-aware responses while maintaining safety across a broad range of use cases. This approach allows Alpie Core to generalize across global and Indian contexts while staying aligned to safe and responsible use guidelines.
## 4. Model Features
1. **Supports Streaming** – Real-time token-level responses
2. **OpenAI-Compatible API** – Seamless integration with OpenAI client libraries
3. **65K Context Length** – Handles very large inputs and conversations
4. **16,384 Max Output Length** – Enables extremely long generations
5. **4-Bit Quantization** – Memory-efficient and optimised for deployment
6. **High Throughput Inference** – Powered by vLLM for efficient large-scale serving
7. **Low Latency Inference** – Fast response times optimized for production
8. **Customizable Safety & Moderation Filters** – Built-in guardrails for safer outputs
9. **Supports Function Calling / Tool Use** – Enables structured outputs and external API integration
10. **Instruction Following** – Optimised for reasoning and chain-of-thought stepwise answers.
11. **Education & Research Ready** – Tailored for competitive exams, STEM reasoning, and knowledge-intensive tasks.
## 5. Key Highlights
1. **First 4-bit Reasoning Model from India**: Competitive globally with frontier models
2. **Benchmark Competitiveness**: Outperforms or matches 70B+ models across reasoning, math, and coding
3. **STEM & Coding Strength**: Excellent on GSM8K, MATH-500, HumanEval, SWE-Bench Verified
4. **Efficiency & Deployment**: 16 GB VRAM footprint, runs on commodity GPUs with vLLM
5. **Extended Context Length**: 65K tokens for research papers, conversations, multi-document reasoning
6. **Environmental Benefits**: ~298–835 kg CO₂e, 2–3× more efficient than FP16 training
7. **Open-Source Commitment**: Released under Apache 2.0 for global use
## 6. Benchmark Results


| Benchmark | Alpie Core (32B-4bit) | DeepSeek-V2 (236B) | Qwen2.5 72B | Llama 3.1 405B | Llama 3.1 70B | Gemma-3 27B-PT | Mistral-Small-24B-Base-2501 |
|-----------|----------------------|-------------------|-------------|---------------|---------------|----------------|----------------------------|
| MMLU (5-shot) | **81.28%** | 78.4% | 85.0% | 84.4% | 79.3% | 78.6% | 80.73% |
| GSM8K (8-shot) | **92.75%** | 81.6% | 88.3% | 83.5% | - | 82.2% | 80.73% |
| BBH (3-shot) | **85.12%** | 78.8% | 79.8% | 82.9% | 81.6% | 77.7% | - |
| MMLU-Pro (5-shot) | **64.78%** | 51.4% | 58.3% | 52.8% | 53.8% | 52.2% | 54.37% |
| MBPP (pass@1) | **75.20%** | 65.0% | 72.6% | 68.4% | - | 65.6% | 69.64% |
| HumanEval (pass@1) | **57.23%** | 43.3% | 53.0% | 54.9% | - | 48.8% | = |
These results demonstrate Alpie Core’s ability to rival or surpass leading proprietary and open-source models, despite being 4-bit quantized.
### SWE-Bench Verified Performance
| Rank | Model | Accuracy (%) | Performance vs Alpie |
|------|-------|-------------|---------------------|
| **1** | **Alpie Core** | **57.8** | **Alpie** |
| 2 | Qwen3-Coder-30B-A3B-Instruct | 51.6 | Below Alpie |
| 3 | o1 | 48.9 | Below Alpie |
| 4 | o3-mini (high) | 49.3 | Below Alpie |
| 5 | Claude 3.5 Sonnet | 49.0 | Below Alpie |
| 6 | DeepSeek R1 | 49.2 | Below Alpie |
| 7 | Devstral | 46.8 | Below Alpie |

### Humanity's Last Exam Leaderboard Performance
| Rank | Model | Accuracy (%) | Performance vs Alpie |
|------|-------|-------------|---------------------|
| 1 | GPT 4.5 Preview | 5.8 | Above Alpie |
| 2 | Claude Sonnet 4 | 5.42 | Above Alpie |
| **3** | **Alpie Core 32B (4-bit)** | **5.41** | **Alpie** |
| 4 | Llama 4 Maverik | 5.34 | Below Alpie |
| 5 | GPT 4.1 | 4.97 | Below Alpie |
| 6 | Kimi K2 Instruct | 4.68 | Below Alpie |
| 7 | DeepSeek V3 | 4.55 | Below Alpie |
| 8 | Gemini 1.5 Pro 002 | 4.55 | Below Alpie |

### Additional Benchmarks
| Benchmark | Alpie Core (32B-4bit) | Category |
|-----------|----------------------|----------|
| AIME | **47.34%** | Advanced Mathematics |
| GPQA (Diamond) | **40.91%** | Graduate-level QA |
| TruthfulQA (MC2) | **60.05%** | Truthfulness |
| HellaSwag | **84.66%** | Commonsense |
| PIQA | **83.24%** | Physical Reasoning |
| ARC Challenge | **67.58%** | Science QA |
| CommonSenseQA | **87.06%** | Commonsense |
| AGIEval | **64.98%** | General Intelligence |
| Winogrande | **79.53%** | Commonsense Reasoning |
| MATH-500 | **70.00%** | Advanced Mathematics |

## 7. Training Details
- **Hardware**: 8× NVIDIA HOPPER-80GB GPUs
- **Fine-tuning Method**: LoRA/QLoRA with the following configuration:
- LoRA Alpha: 16
- LoRA Dropout: 0.05
- LoRA Rank: 16
- **Quantization**: 4-bit NF4 + Double Quantization + FP16 compute
- **Dataset Domains**: Mathematics, coding, reasoning, science, general knowledge, competitive exams, Indian context + law, multilingual (Hindi and Hinglish)
- **Synthetic Data Advantage**: +15-20% performance boost in STEM & coding domains
- **Training Strategy**: Multi-stage distillation → SFT → safety alignment.
- **Synthetic Data Advantage:** Clarify source: LLM-generated, curated with multi-turn reasoning traces for STEM/coding.
## 8. Environmental Impact

**Carbon Footprint**: We estimated the environmental impact of training Alpie Core (32B) on 8× NVIDIA H100-80GB GPUs by calculating carbon emissions from GPU energy consumption. The calculation follows the formula:
CO₂e (kg) = Grid CO₂ Factor (kg/kWh) × Runtime (hours) × Power per GPU (kW) × Number of GPUs
Training Parameters:
Grid CO₂ Factor (Azure average): 0.364 kg CO₂e per kWh
Runtime: 408 hours
GPUs: 8× H100-80GB
We report results under two assumption modes:
Realistic mode (average training draw ≈ 250 W per GPU = 0.25 kWh/hr): 0.364 × 408 × 0.25 × 8 ≈ 298 kg CO₂e
Conservative mode (near TDP ≈ 700 W per GPU = 0.70 kWh/hr): 0.364 × 408 × 0.70 × 8 ≈ 835 kg CO₂e
Total training footprint ranges from ~298 kg CO₂e (realistic) to ~835 kg CO₂e (conservative worst-case)
*This makes Alpie Core one of the most carbon-efficient reasoning models released to date.*
## 9. Use Cases
Best for **STEM**, **complex mathematical reasoning**, **coding**, and **Indian context**
1.**STEM**: Excels at solving advanced problems in science, technology, engineering, and mathematics with high accuracy.
2.**Complex Mathematical Reasoning**: Handles multi-step logical and quantitative reasoning tasks with strong reliability.
3.**Coding**: Supports software development, debugging, algorithmic problem-solving, and structured reasoning in code..
4.**Indian Context**: Provides culturally aware insights, competitive exam assistance (JEE, NEET, UPSC), and multilingual support in Hindi/Hinglish.
5.**Research Assistants**: Handle long contexts (65K) for academic and legal research.
## 10. Safety and Limitations
### Enhanced Content Access
Unlike the base DeepSeek model, Alpie Core provides factual, balanced responses to geopolitically sensitive questions, offering global accessibility and factual accuracy on topics like Taiwan's status, Arunachal Pradesh sovereignty, and other sensitive geopolitical issues.
### Current Limitations
- Multilingual reasoning in Hindi/Hinglish shows room for improvement
- Fixed knowledge cutoff without real-time information retrieval
- Occasional struggles with complex multi-hop mathematical reasoning
- Potential hallucinations in factual question-answering
- Hallucinations: As with all LLMs, outputs should not be used for medical/legal advice without expert oversight.
- Biases: Training on synthetic + curated datasets reduces bias, but some risks may persist.
### Mitigations
- Safety classifiers and output filtering systems
- Model-assisted safety pipeline using RLHF
- Comprehensive adversarial testing by domain experts
## 11. How to Use
### Non-Streaming Inference
```python
from transformers import AutoModelForCausalLM, AutoTokenizer
from peft import PeftModel, PeftConfig
import torch
# Load LoRA adapter configuration to find the base model
peft_model_id = "169Pi/Alpie-Core"
config = PeftConfig.from_pretrained(peft_model_id)
# Load the base model
base_model = AutoModelForCausalLM.from_pretrained(
config.base_model_name_or_path,
torch_dtype=torch.float16,
device_map="auto"
)
# Load tokenizer
tokenizer = AutoTokenizer.from_pretrained(config.base_model_name_or_path)
# Load LoRA weights
model = PeftModel.from_pretrained(base_model, peft_model_id)
# Ensure evaluation mode
model.eval()
# Sample inference
prompt = "Solve the Riemann Hypothesis and provide a final answer?"
inputs = tokenizer(prompt, return_tensors="pt").to(model.device)
with torch.no_grad():
outputs = model.generate(**inputs, max_new_tokens=1000)
response = tokenizer.decode(outputs[0], skip_special_tokens=True)
print("Response:\n", response)
```
### Streaming Inference
```python
from transformers import AutoModelForCausalLM, AutoTokenizer, TextStreamer
from peft import PeftModel, PeftConfig
import torch
# Load LoRA adapter configuration to find the base model
peft_model_id = "169Pi/Alpie-Core"
config = PeftConfig.from_pretrained(peft_model_id)
# Load the base model
base_model = AutoModelForCausalLM.from_pretrained(
config.base_model_name_or_path,
torch_dtype=torch.float16,
device_map="auto"
)
# Load tokenizer
tokenizer = AutoTokenizer.from_pretrained(config.base_model_name_or_path)
# Load LoRA weights
model = PeftModel.from_pretrained(base_model, peft_model_id)
# Ensure evaluation mode
model.eval()
# Initialize streamer
streamer = TextStreamer(tokenizer, skip_prompt=True, skip_special_tokens=True)
# Sample streaming inference
prompt = "Solve the Riemann Hypothesis and provide a final answer?"
inputs = tokenizer(prompt, return_tensors="pt").to(model.device)
print("Streaming Response:")
with torch.no_grad():
outputs = model.generate(
**inputs,
max_new_tokens=1000,
streamer=streamer,
do_sample=True,
temperature=0.7,
top_p=0.9
)
```
### Deployment Options
- **Transformers**: Python, PyTorch integration
- **vLLM**: High-throughput inference
- **Ollama**: Easy local deployment and inference
- **Size**: 20GB
- **Requirements**: Minimum 20GB RAM/VRAM for local execution
- **Local Deployment**: Runs efficiently on local machines with sufficient resources
```bash
# Pull the model
ollama pull 169pi/alpie-core
# Run the model
ollama run 169pi/alpie-core
```
## 12. Citation
```bibtex
@misc{169pi2025alpiecore,
title = {Alpie-Core: A 4-Bit Quantized Reasoning Model from India that Outperforms Full-Precision Models},
author = {169Pi AI},
year = {2025},
url = {https://huggingface.co/169Pi/Alpie-Core}
}
```
## 13. Community & Contributions
This model is released under the Apache 2.0 license, and we warmly welcome the community to build, download, and extend it.
1.**Issues & Discussions:** Report bugs, suggest features, or start conversations on the Hugging Face model page.
2.**Contributions:** Pull requests are welcome for error fixes, performance improvements, and extended functionality.
3.**Fine-tuning Results:** Share your experiments, benchmarks, and downstream applications with the community.
4.**Collaboration:** We encourage researchers, developers, and organisations to join in shaping the future of this model.
Together, we can continue to improve accessibility, safety, and performance for real-world AI applications.
## 14. License
Apache 2.0 License – Permissive, allowing free use, modification, and distribution for both research and commercial purposes.
## 15. Acknowledgements / Credits
We would like to thank DeepSeek for their original model, which served as the foundation for this work. Our team fine-tuned the model and implemented 4-bit quantization, achieving improved efficiency and accuracy for downstream tasks. This model is built with respect to the contributions of the original authors and aims to provide a safe, high-performance solution for reasoning and inference.
We are also grateful to the Hugging Face ecosystem (Transformers, PEFT, vLLM, bitsandbytes), the open-source community datasets (MMLU, GSM8K, SWE-Bench, and others), and the support of various cloud providers. Finally, we acknowledge the broader AI research community and companies whose innovations and insights continue to inspire our work.
## 16. Contact
For technical inquiries and support: **contact@169pi.com**
---
Alpie Core represents a milestone for open-source AI from India, one of the first globally to show that 4-bit reasoning models can rival frontier-scale systems. We hope this release empowers developers, researchers, and organisations worldwide to build more efficient, inclusive, and impactful AI.
*For technical details, training methodology, and comprehensive evaluation results, please refer to our technical report.* |