Screenshot

VibeStudio/MiniMax-M2-THRIFT-55-v1

Targeted Reduction for Inference and Fine-Tuning — ~55% Expert Pruned

A lean, efficiency-first variant of MiniMax-M2 designed to maximize latency, throughput, and VRAM savings for local, on-prem, and edge deployments.

TLDR

  • What: ~55% expert-pruned MoE with staged pruning + knowledge distillation.
  • Why: Push the efficiency frontier for compact, responsive deployments.
  • Now: Ready for experimentation with solid coverage across core evals and more on the way.

Why it’s useful

  • Lower latency: Fast, responsive interactions for interactive apps and tools.
  • Smaller memory footprint: Fits tighter VRAM budgets and increases node density.
  • Higher throughput: Serve more concurrent users on the same hardware.
  • Deployment-friendly: Smooth drop-in via SGLang with OpenAI-compatible API.
  • Adaptable: Plays well with light fine-tuning to match domain and style.

Intended use

  • Local/air-gapped assistants and dev tools
  • Cost-sensitive batches and realtime services
  • Edge and on-prem deployments prioritizing efficiency

How Our Approach Works

Active research in progress — we continue to iterate and expand ablations.

  • Teacher–student setup: Start with MiniMax-M2 as teacher and a copy as student.

  • Gradual expert pruning: Remove ≈5% experts per stage over ~11 stages (≈55% total), guided by importance scores with a lightweight Leave-One-Expert-Out check to retain rare-but-important experts.

  • Distill after each prune: Retrain the student to imitate the teacher on

    • Outputs (token probability distributions),
    • Hidden states, and
    • Router behavior over the surviving experts.

Run AI Coding Agents Fully Locally (Mac Studio, DGX Spark, AMD AI Max) https://github.com/latent-variable/minimax-agent-guide

Downloads last month
92
Safetensors
Model size
106B params
Tensor type
U8
·
U32
·
F32
·
BF16
·
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for VibeStudio/MiniMax-M2-THRIFT-55-MLX-4bit

Quantized
(43)
this model

Datasets used to train VibeStudio/MiniMax-M2-THRIFT-55-MLX-4bit

Collection including VibeStudio/MiniMax-M2-THRIFT-55-MLX-4bit