Dynamic 3-bit DeepSeek V3.1 GGUF gets 75.6% on Aider Polyglot
Hey everyone, ever since we released Dynamic GGUFs, we've received so much love thanks to you all, but we know better benchmarking was a top request!
Previously, we already benchmarked Gemma 3 and Llama 4 on 5-shot MMLU and KL Divergence but we're happy to showcase Aider Polyglot benchmarks for our DeepSeek-V3.1 GGUFs and were quite surprised by the results! Blogpost + details: https://docs.unsloth.ai/basics/unsloth-dynamic-ggufs-on-aider-polyglot
- Our 1-bit Unsloth Dynamic GGUF shrinks DeepSeek-V3.1 from 671GB → 192GB (-75% size) and no-thinking mode outperforms GPT-4.1 (Apr 2025), GPT-4.5, and DeepSeek-V3-0324.
- 3-bit Unsloth DeepSeek-V3.1 (thinking) GGUF: Outperforms Claude-4-Opus (thinking).
- 5-bit Unsloth DeepSeek-V3.1 (non-thinking) GGUF: Matches Claude-4-Opus (non-thinking) performance.
- Our Dynamic GGUFs perform consistently better than other non-Unsloth Dynamic imatrix GGUFs
- Other non-Unsloth 1-bit and 2-bit DeepSeek-V3.1 quantizations, as well as standard 1-bit quantization without selective layer quantization, either failed to load or produced gibberish and looping outputs.
For our DeepSeek-V3.1 experiments, we compared different bits of Unsloth Dynamic GGUFs against:
- Full-precision, unquantized LLMs including GPT 4.5, 4.1, Claude-4-Opus, DeepSeek-V3-0324 etc.
- Other dynamic imatrix V3.1 GGUFs
- Semi-dynamic (some selective layer quantization) imatrix V3.1 GGUFs for ablation purposes.
Benchmark experiments were mainly conducted by David (neolithic5452 on Aider Disc), a trusted community contributor to Aider Polyglot evaluations. Tests were run ~3 times and averaged for a median score, and the Pass-2 accuracy is reported as by convention.


