view article Article KV Caching Explained: Optimizing Transformer Inference Efficiency Jan 30, 2025 • 243
TeichAI/GLM-4.7-Flash-Claude-Opus-4.5-High-Reasoning-Distill-GGUF 30B • Updated 23 days ago • 87.3k • 473
Llama 3.2 Collection Meta's new Llama 3.2 vision and text models including 1B, 3B, 11B and 90B. Includes GGUF, 4-bit bnb and original versions. • 25 items • Updated 6 days ago • 68