arxiv:2603.25750

Sommelier: Scalable Open Multi-turn Audio Pre-processing for Full-duplex Speech Language Models

Published on Mar 20

· Submitted by

Kyudan Jung on Mar 30

KAIST AI

Upvote

Authors:

Kyudan Jung ,

Jaegul Choo ,

Cheonbok Park

Abstract

Full-duplex speech language models require high-quality multi-speaker conversational data, which is scarce, necessitating a robust open-source data processing pipeline to address challenges in natural dialogue dynamics and system accuracy.

AI-generated summary

As the paradigm of AI shifts from text-based LLMs to Speech Language Models (SLMs), there is a growing demand for full-duplex systems capable of real-time, natural human-computer interaction. However, the development of such models is constrained by the scarcity of high-quality, multi-speaker conversational data, as existing large-scale resources are predominantly single-speaker or limited in volume. Addressing the complex dynamics of natural dialogue, such as overlapping and back-channeling remains a challenge, with standard processing pipelines suffering from diarization errors and ASR hallucinations. To bridge this gap, we present a robust and scalable open-source data processing pipeline designed for full-duplex model.

View arXiv page View PDF Project page GitHub 1 Add to collection

Community

Kyudan

Paper author Paper submitter about 14 hours ago

A full-duplex system allows users to interrupt the LLM at any time, and the LLM can also naturally chime in and respond to what we say. This is an area currently being actively researched in the speech domain, and we expect it to expand into other fields in the future.
We have proposed a pipeline for pre-processing full-duplex data based on real-world datasets.

Upload images, audio, and videos by dragging in the text input, pasting, or clicking here.

Tap or paste here to upload images

· Sign up or log in to comment

Upvote

Models citing this paper 0

No model linking this paper

Cite arxiv.org/abs/2603.25750 in a model README.md to link it from this page.

Datasets citing this paper 0

No dataset linking this paper

Cite arxiv.org/abs/2603.25750 in a dataset README.md to link it from this page.

Spaces citing this paper 0

No Space linking this paper

Cite arxiv.org/abs/2603.25750 in a Space README.md to link it from this page.

Collections including this paper 0

No Collection including this paper

Add this paper to a collection to link it from this page.