Long context reasoning scores bad?

#49

by sebastienbo - opened 1 day ago

Discussion

sebastienbo

1 day ago

•

edited 1 day ago

Hi,

Is there a reason that GLM 4.7 flash has problems with long context tasks?

Check out the benchmark :AA-LCR (Long Context Reasoning)at the middle of this page:
https://artificialanalysis.ai/models/open-source/small

I ask this because I wanted to use it for coding (agent) but that requires long context reasoning.

ps: It is constructive feedback, because I really like this glm4.7flash model, it is my top choice. It replaces gpt-oss-20b for all my use cases (because it is almost as fast as that one).

llmtnbl

Z.ai org about 16 hours ago

Thanks for the interest in the model!

We actually looked into this and noticed that the provider Artificial Analysis used for the benchmark isn't the official one. It looks like there might be an issue with that specific deployment setup. We suggest trying the official Z.ai API instead.

For your reference, in our internal local testing, an AA-LCR score of around 45 is considered a reasonable and expected value.

sebastienbo

about 12 hours ago

Thank you for having looked at it.
I appreciate it.

Upload images, audio, and videos by dragging in the text input, pasting, or clicking here.

Tap or paste here to upload images

· Sign up or log in to comment