arxiv:2602.23220

STELLAR: Storage Tuning Engine Leveraging LLM Autonomous Reasoning for High Performance Parallel File Systems

Published on Feb 26

Authors:

Abstract

STELLAR autonomously tunes high-performance parallel file systems using large language models and reinforcement learning techniques, achieving near-optimal configurations quickly and making complex I/O optimization accessible to domain scientists.

AI-generated summary

I/O performance is crucial to efficiency in data-intensive scientific computing; but tuning large-scale storage systems is complex, costly, and notoriously manpower-intensive, making it inaccessible for most domain scientists. To address this problem, we propose STELLAR, an autonomous tuner for high-performance parallel file systems. Our evaluations show that STELLAR almost always selects near-optimal parameter configurations for parallel file systems within the first five attempts, even for previously unseen applications. STELLAR differs fundamentally from traditional autotuning methods, which often require hundreds of thousands of iterations to converge. Powered by large language models (LLMs), STELLAR enables autonomous end-to-end agentic tuning by (1) accurately extracting tunable parameters from software manuals, (2) analyzing I/O trace logs generated by applications, (3) selecting initial tuning strategies, (4) rerunning applications on real systems and collecting I/O performance feedback, (5) adjusting tuning strategies and repeating the tuning cycle, and (6) reflecting on and summarizing tuning experiences into reusable knowledge for future optimizations. STELLAR integrates retrieval-augmented generation (RAG), tool execution, LLM-based reasoning, and a multiagent design to stabilize reasoning and combat hallucinations. We evaluate the impact of each component on optimization outcomes, providing design insights for similar systems in other optimization domains. STELLAR's architecture and empirical results highlight a promising approach to complex system optimization, especially for problems with large search spaces and high exploration costs, while making I/O tuning more accessible to domain scientists with minimal added resources.

View arXiv page View PDF Add to collection

Community

Upload images, audio, and videos by dragging in the text input, pasting, or clicking here.

Tap or paste here to upload images

· Sign up or log in to comment

Upvote

Models citing this paper 0

No model linking this paper

Cite arxiv.org/abs/2602.23220 in a model README.md to link it from this page.

Datasets citing this paper 0

No dataset linking this paper

Cite arxiv.org/abs/2602.23220 in a dataset README.md to link it from this page.

Spaces citing this paper 0

No Space linking this paper

Cite arxiv.org/abs/2602.23220 in a Space README.md to link it from this page.

Collections including this paper 0

No Collection including this paper

Add this paper to a collection to link it from this page.