Inference speed benchmarks

by engrtipusultan - opened about 24 hours ago

about 24 hours ago

GLM Flash is great model but it takes nose dive in inference speed at higher contexts. Since this model has same arch can you kindly share the llama-bench at higher contexts. Does this model behave the same or different. This is where mamba2 and qwen3next arch are way better.

Upload images, audio, and videos by dragging in the text input, pasting, or clicking here.

Tap or paste here to upload images

· Sign up or log in to comment