arxiv:2511.04460
QRQ
RichardQRQ
AI & ML interests
None yet
Recent Activity
upvoted a paper about 14 hours ago
WeaveBench: A Long-Horizon, Real-World Benchmark for Computer-Use Agents with Hybrid Interfaces upvoted a paper 3 days ago
Toward Generalist Autonomous Research via Hypothesis-Tree Refinement upvoted a paper 5 days ago
SWE-Explore: Benchmarking How Coding Agents Explore RepositoriesOrganizations
None yet