Papers
arxiv:2605.25200

GroupTravelBench: Benchmarking LLM Agents on Multi-Person Travel Planning

Published on May 24
Authors:
,
,
,
,
,

Abstract

A new benchmark called GroupTravelBench is introduced to evaluate multi-user travel planning capabilities of LLM agents, focusing on elicitation, coordination, and planning skills in complex group scenarios.

Travel planning is a realistic task for evaluating the planning and tool-use abilities of LLM agents. However, existing benchmarks typically assume only a single user, thereby avoiding one of the most challenging aspects of real-world scenarios: an agent's ability to identify and resolve conflicts among multiple users. To address this gap, we introduce GroupTravelBench, the first benchmark for multi-user, multi-turn travel planning. Based on real user profiles, POI data, and ticket price data, we synthesize 650 tasks and divide them into three difficulty levels. Beyond standard abilities in single-user itinerary planning, such as multi-step reasoning and tool use, our benchmark further evaluates three key capabilities required for travel agents: (i) elicitation -- proactively engaging in multi-turn dialogue to gather preferences from each user; (ii) coordination -- resolving conflicts among users through compromise or subgrouping strategies; and (iii) planning -- searching for travel plans that maximize overall group utility while maintaining fairness and feasibility. To simulate real-world conversational itinerary planning while enabling reliable tool use and offline evaluation, we build an interactive sandbox environment with cached real-world tool data. We evaluate a wide range of LLMs and find that even frontier models still show substantial weaknesses in preference coverage and group fairness. GroupTravelBench provides a practical and reproducible benchmark for advancing research on LLM agents for real-world travel planning.

Community

Sign up or log in to comment

Get this paper in your agent:

hf papers read 2605.25200
Don't have the latest CLI?
curl -LsSf https://hf.co/cli/install.sh | bash

Models citing this paper 0

No model linking this paper

Cite arxiv.org/abs/2605.25200 in a model README.md to link it from this page.

Datasets citing this paper 1

Spaces citing this paper 0

No Space linking this paper

Cite arxiv.org/abs/2605.25200 in a Space README.md to link it from this page.

Collections including this paper 0

No Collection including this paper

Add this paper to a collection to link it from this page.