VibeStudio/MiniMax-M2-THRIFT-55-v1

Targeted Reduction for Inference and Fine-Tuning — ~55% Expert Pruned

A lean, efficiency-first variant of MiniMax-M2 designed to maximize latency, throughput, and VRAM savings for local, on-prem, and edge deployments.

TLDR

What: ~55% expert-pruned MoE with staged pruning + knowledge distillation.
Why: Push the efficiency frontier for compact, responsive deployments.
Now: Ready for experimentation with solid coverage across core evals and more on the way.

Why it’s useful

Lower latency: Fast, responsive interactions for interactive apps and tools.
Smaller memory footprint: Fits tighter VRAM budgets and increases node density.
Higher throughput: Serve more concurrent users on the same hardware.
Deployment-friendly: Smooth drop-in via SGLang with OpenAI-compatible API.
Adaptable: Plays well with light fine-tuning to match domain and style.

Intended use

Local/air-gapped assistants and dev tools
Cost-sensitive batches and realtime services
Edge and on-prem deployments prioritizing efficiency

How Our Approach Works

Active research in progress — we continue to iterate and expand ablations.

Teacher–student setup: Start with MiniMax-M2 as teacher and a copy as student.
Gradual expert pruning: Remove ≈5% experts per stage over ~11 stages (≈55% total), guided by importance scores with a lightweight Leave-One-Expert-Out check to retain rare-but-important experts.
Distill after each prune: Retrain the student to imitate the teacher on
- Outputs (token probability distributions),
- Hidden states, and
- Router behavior over the surviving experts.

Run AI Coding Agents Fully Locally (Mac Studio, DGX Spark, AMD AI Max) https://github.com/latent-variable/minimax-agent-guide

Downloads last month: 92

Safetensors

Model size

106B params

Tensor type

U32

F32

BF16

Inference Providers NEW

This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for VibeStudio/MiniMax-M2-THRIFT-55-MLX-4bit

Base model

MiniMaxAI/MiniMax-M2

Quantized

(43)

this model

Datasets used to train VibeStudio/MiniMax-M2-THRIFT-55-MLX-4bit

Collection including VibeStudio/MiniMax-M2-THRIFT-55-MLX-4bit

THRIFT

Collection

Coding specific Pruned from SOTA models • 6 items • Updated 4 days ago • 8