Jackrong/Qwen3.5-27B-Claude-4.6-Opus-Reasoning-Distilled Text Generation • 28B • Updated 7 days ago • 58.8k • 627
How Should I Build A Benchmark? Revisiting Code-Related Benchmarks For LLMs Paper • 2501.10711 • Published Jan 18, 2025 • 1
UTBoost: Rigorous Evaluation of Coding Agents on SWE-Bench Paper • 2506.09289 • Published Jun 10, 2025 • 2
UTBoost: Rigorous Evaluation of Coding Agents on SWE-Bench Paper • 2506.09289 • Published Jun 10, 2025 • 2
Describe Anything Collection Multimodal Large Language Models for Detailed Localized Image and Video Captioning • 7 items • Updated 3 days ago • 61