Inference speed benchmarks

#5
by engrtipusultan - opened

GLM Flash is great model but it takes nose dive in inference speed at higher contexts. Since this model has same arch can you kindly share the llama-bench at higher contexts. Does this model behave the same or different. This is where mamba2 and qwen3next arch are way better.

Sign up or log in to comment