Long context reasoning scores bad?
Hi,
Is there a reason that GLM 4.7 flash has problems with long context tasks?
Check out the benchmark :AA-LCR (Long Context Reasoning)at the middle of this page:
https://artificialanalysis.ai/models/open-source/small
I ask this because I wanted to use it for coding (agent) but that requires long context reasoning.
ps: It is constructive feedback, because I really like this glm4.7flash model, it is my top choice. It replaces gpt-oss-20b for all my use cases (because it is almost as fast as that one).
Thanks for the interest in the model!
We actually looked into this and noticed that the provider Artificial Analysis used for the benchmark isn't the official one. It looks like there might be an issue with that specific deployment setup. We suggest trying the official Z.ai API instead.
For your reference, in our internal local testing, an AA-LCR score of around 45 is considered a reasonable and expected value.
Thank you for having looked at it.
I appreciate it.