"Not all quantized model perform good", serving framework ollama uses NVIDIA gpu, llama.cpp uses CPU with AVX & AMX
v1k
xbruce22
AI & ML interests
None yet
Recent Activity
liked
a model
about 7 hours ago
Qwen/Qwen3.5-397B-A17B-FP8
liked
a model
about 12 hours ago
Qwen/Qwen3-ASR-1.7B
liked
a model
4 days ago
jdopensource/JoyAI-LLM-Flash