arxiv:2605.25200

GroupTravelBench: Benchmarking LLM Agents on Multi-Person Travel Planning

Published on May 24

Authors:

Abstract

A new benchmark called GroupTravelBench is introduced to evaluate multi-user travel planning capabilities of LLM agents, focusing on elicitation, coordination, and planning skills in complex group scenarios.

Generated by Qwen/Qwen2.5-Coder-32B-Instruct

Travel planning is a realistic task for evaluating the planning and tool-use abilities of LLM agents. However, existing benchmarks typically assume only a single user, thereby avoiding one of the most challenging aspects of real-world scenarios: an agent's ability to identify and resolve conflicts among multiple users. To address this gap, we introduce GroupTravelBench, the first benchmark for multi-user, multi-turn travel planning. Based on real user profiles, POI data, and ticket price data, we synthesize 650 tasks and divide them into three difficulty levels. Beyond standard abilities in single-user itinerary planning, such as multi-step reasoning and tool use, our benchmark further evaluates three key capabilities required for travel agents: (i) elicitation -- proactively engaging in multi-turn dialogue to gather preferences from each user; (ii) coordination -- resolving conflicts among users through compromise or subgrouping strategies; and (iii) planning -- searching for travel plans that maximize overall group utility while maintaining fairness and feasibility. To simulate real-world conversational itinerary planning while enabling reliable tool use and offline evaluation, we build an interactive sandbox environment with cached real-world tool data. We evaluate a wide range of LLMs and find that even frontier models still show substantial weaknesses in preference coverage and group fairness. GroupTravelBench provides a practical and reproducible benchmark for advancing research on LLM agents for real-world travel planning.

View arXiv page View PDF Add to collection

Community

Upload images, audio, and videos by dragging in the text input, pasting, or clicking here.

Tap or paste here to upload images

· Sign up or log in to comment

Upvote

Get this paper in your agent:

hf papers read 2605.25200

Don't have the latest CLI?

curl -LsSf https://hf.co/cli/install.sh | bash

Models citing this paper 0

No model linking this paper

Cite arxiv.org/abs/2605.25200 in a model README.md to link it from this page.

Datasets citing this paper 1

Spaces citing this paper 0

No Space linking this paper

Cite arxiv.org/abs/2605.25200 in a Space README.md to link it from this page.

Collections including this paper 0

No Collection including this paper

Add this paper to a collection to link it from this page.