ResearchGym: Evaluating Language Model Agents on Real-World AI Research Paper • 2602.15112 • Published 18 days ago • 20
Jais-2-Family Collection The 2nd generation of the Jais Large Language Models Family • 4 items • Updated 14 days ago • 13
view article Article OpenEnv in Practice: Evaluating Tool-Using Agents in Real-World Environments +3 22 days ago • 30
view article Article Community Evals: Because we're done trusting black-box leaderboards over the community +5 about 1 month ago • 85