Perry the Platypus PRO

AgPerry

7 32 3

AI & ML interests

None yet

Recent Activity

upvoted a collection 2 days ago

ClawBench & Agentic Web Benchmarks

new activity 3 days ago

TIGER-Lab/ClawBench:ClawBench V2: reproducible real-world web-agent evaluation

new activity 10 days ago

huggingface/HuggingDiscussions:[FEEDBACK] Daily Papers

View all activity

Organizations

upvoted a collection 2 days ago

ClawBench & Agentic Web Benchmarks

Collection

ClawBench and related papers and open datasets for real-world web and computer-use agent evaluation. • 360 items • Updated 2 days ago • 1

New activity in TIGER-Lab/ClawBench 3 days ago

ClawBench V2: reproducible real-world web-agent evaluation

#2 opened 3 days ago by

AgPerry

New activity in huggingface/HuggingDiscussions 10 days ago

[FEEDBACK] Daily Papers

🔥❤️ 21

207

#32 opened about 2 years ago by

kramp

commented a paper 15 days ago

Function-Aware Fill-in-the-Middle as Mid-Training for Coding Agent Foundation Models

Paper • 2607.12463 • Published 18 days ago • 108 •

upvoted 2 papers 16 days ago

Function-Aware Fill-in-the-Middle as Mid-Training for Coding Agent Foundation Models

Paper • 2607.12463 • Published 18 days ago • 108

Search Beyond What Can Be Taught: Evolving the Knowledge Boundary in Agentic Visual Generation

Paper • 2607.05382 • Published 23 days ago • 87

updated a dataset 21 days ago

AgPerry/rsi-bench

Viewer • Updated 21 days ago • 300 • 58

published a dataset 22 days ago

AgPerry/rsi-bench

Viewer • Updated 21 days ago • 300 • 58

upvoted a paper about 1 month ago

Dr-DCI: Scaling Direct Corpus Interaction via Dynamic Workspace Expansion

Paper • 2606.14885 • Published Jun 12 • 12

upvoted 2 papers about 2 months ago

ScientistOne: Towards Human-Level Autonomous Research via Chain-of-Evidence

Paper • 2605.26340 • Published May 25 • 37

Where, What, Why, and Importance: Structured Defect Grounding for Text-to-Image Feedback

Paper • 2606.06113 • Published Jun 4 • 16

updated a dataset about 2 months ago

TIGER-Lab/ClawBench

Viewer • Updated Jun 10 • 283 • 518 • 1

upvoted a paper about 2 months ago

MIRA: Mid-training Rubric Anchoring for Source-Aware Data Selection

Paper • 2605.30288 • Published May 29 • 23

updated a Space 2 months ago

ClawBench Leaderboard

🦀

Can AI agents complete everyday online tasks?

updated 4 datasets 2 months ago

commented a paper 3 months ago

RewardHarness: Self-Evolving Agentic Post-Training

Paper • 2605.08703 • Published May 9 • 10 •

upvoted a paper 3 months ago