Inference speed benchmarks
#5
by
engrtipusultan
- opened
GLM Flash is great model but it takes nose dive in inference speed at higher contexts. Since this model has same arch can you kindly share the llama-bench at higher contexts. Does this model behave the same or different. This is where mamba2 and qwen3next arch are way better.