SOTA ternary-packed versions of 1.58-bit LLMs for efficient on-device inference with vlut.cpp.
Xiangyu Li
XXXXyu
AI & ML interests
On-device and physical AI
Recent Activity
authored
a paper
about 14 hours ago
OxyGen: Unified KV Cache Management for Vision-Language-Action Models under Multi-Task Parallelism commented on
a paper
about 20 hours ago
Vec-LUT: Vector Table Lookup for Parallel Ultra-Low-Bit LLM Inference on Edge Devices upvoted a paper about 20 hours ago
OxyGen: Unified KV Cache Management for Vision-Language-Action Models under Multi-Task Parallelism Organizations
None yet