Snowflake/dare-bench
Viewer
• Updated
• 2.3k • 50 • 3
None defined yet.
Strategic Navigation or Stochastic Search? How Agents and Humans Reason Over Document Collections
When Agents Disagree With Themselves: Measuring Behavioral Consistency in LLM-Based Agents