TermiGen: High-Fidelity Environment and Robust Trajectory Synthesis for Terminal Agents Paper • 2602.07274 • Published 6 days ago • 23
view reply hi @naufalso ! Lighteval now suport inspect-ai as a backend, so everything supported by inspect is integrrated in lighteval 🔥
view article Article Community Evals: Because we're done trusting black-box leaderboards over the community +5 9 days ago • 63