view article Article ⚡ nano-vLLM: Lightweight, Low-Latency LLM Inference from Scratch Jun 28, 2025 • 29
nvidia/Llama-3.1-Nemotron-70B-Instruct-HF Text Generation • 71B • Updated Apr 13, 2025 • 4.85k • • 2.06k
RedHatAI/Meta-Llama-3.1-8B-Instruct-quantized.w8a16 Text Generation • 3B • Updated Oct 23, 2024 • 2.24k • 12
sentence-transformers/all-MiniLM-L6-v2 Sentence Similarity • 22.7M • Updated Mar 6, 2025 • 144M • • 4.29k