Sparse-BitNet: 1.58-bit LLMs are Naturally Friendly to Semi-Structured Sparsity
Abstract
Sparse-BitNet demonstrates that 1.58-bit quantization works better with N:M sparsity than full-precision models, achieving stable training and improved efficiency across different scales and regimes.
Semi-structured N:M sparsity and low-bit quantization (e.g., 1.58-bit BitNet) are two promising approaches for improving the efficiency of large language models (LLMs), yet they have largely been studied in isolation. In this work, we investigate their interaction and show that 1.58-bit BitNet is naturally more compatible with N:M sparsity than full-precision models. To study this effect, we propose Sparse-BitNet, a unified framework that jointly applies 1.58-bit quantization and dynamic N:M sparsification while ensuring stable training for the first time. Across multiple model scales and training regimes (sparse pretraining and dense-to-sparse schedules), 1.58-bit BitNet consistently exhibits smaller performance degradation than full-precision baselines at the same sparsity levels and can tolerate higher structured sparsity before accuracy collapse. Moreover, using our custom sparse tensor core, Sparse-BitNet achieves substantial speedups in both training and inference, reaching up to 1.30X. These results highlight that combining extremely low-bit quantization with semi-structured N:M sparsity is a promising direction for efficient LLMs. Code available at https://github.com/AAzdi/Sparse-BitNet
Community
This is an automated message from the Librarian Bot. I found the following papers similar to this paper.
The following papers were recommended by the Semantic Scholar API
- SlideSparse: Fast and Flexible (2N-2):2N Structured Sparsity (2026)
- Sparsity Induction for Accurate Post-Training Pruning of Large Language Models (2026)
- To 2:4 Sparsity and Beyond: Neuron-level Activation Function to Accelerate LLM Pre-Training (2026)
- D$^2$Quant: Accurate Low-bit Post-Training Weight Quantization for LLMs (2026)
- Sherry: Hardware-Efficient 1.25-Bit Ternary Quantization via Fine-grained Sparsification (2026)
- RNM-TD3: N:M Semi-structured Sparse Reinforcement Learning From Scratch (2026)
- HBVLA: Pushing 1-Bit Post-Training Quantization for Vision-Language-Action Models (2026)
Please give a thumbs up to this comment if you found it helpful!
If you want recommendations for any Paper on Hugging Face checkout this Space
You can directly ask Librarian Bot for paper recommendations by tagging it in a comment: @librarian-bot recommend
Models citing this paper 0
No model linking this paper
Datasets citing this paper 0
No dataset linking this paper
Spaces citing this paper 0
No Space linking this paper
Collections including this paper 0
No Collection including this paper