ParEVO: Synthesizing Code for Irregular Data: High-Performance Parallelism through Agentic Evolution
Abstract
ParEVO synthesizes high-performance parallel algorithms for irregular data structures using specialized LLMs and evolutionary coding to achieve significant speedups over existing methods.
The transition from sequential to parallel computing is essential for modern high-performance applications but is hindered by the steep learning curve of concurrent programming. This challenge is magnified for irregular data structures (such as sparse graphs, unbalanced trees, and non-uniform meshes) where static scheduling fails and data dependencies are unpredictable. Current Large Language Models (LLMs) often fail catastrophically on these tasks, generating code plagued by subtle race conditions, deadlocks, and sub-optimal scaling. We bridge this gap with ParEVO, a framework designed to synthesize high-performance parallel algorithms for irregular data. Our contributions include: (1) The Parlay-Instruct Corpus, a curated dataset of 13,820 tasks synthesized via a "Critic-Refine" pipeline that explicitly filters for empirically performant algorithms that effectively utilize Work-Span parallel primitives; (2) specialized DeepSeek, Qwen, and Gemini models fine-tuned to align probabilistic generation with the rigorous semantics of the ParlayLib library; and (3) an Evolutionary Coding Agent (ECA) that improves the "last mile" of correctness by iteratively repairing code using feedback from compilers, dynamic race detectors, and performance profilers. On the ParEval benchmark, ParEVO achieves an average 106x speedup (with a maximum of 1103x) across the suite, and a robust 13.6x speedup specifically on complex irregular graph problems, outperforming state-of-the-art commercial models. Furthermore, our evolutionary approach matches state-of-the-art expert human baselines, achieving up to a 4.1x speedup on specific highly-irregular kernels. Source code and datasets are available at https://github.com/WildAlg/ParEVO.
Community
ParEVO: Synthesizing Code for Irregular Data: High-Performance Parallelism through Agentic Evolution
TL;DR: While current LLMs excel at writing standard sequential code, they often fail catastrophically when attempting to write concurrent algorithms for complex data structures—generating code riddled with race conditions, deadlocks, and sub-optimal scaling. This paper introduces ParEVO, an end-to-end framework featuring a specialized dataset and an evolutionary agentic feedback loop that synthesizes highly optimized parallel code for irregular data.
🔑 Key Highlights:
- The Irregular Data Challenge: Parallelizing code for uniform data is straightforward, but irregular data structures (e.g., sparse graphs, unbalanced trees, and non-uniform meshes) are notoriously difficult because static scheduling fails and data dependencies are unpredictable. Current frontier LLMs struggle heavily in this high-friction domain.
- The Parlay-Instruct Corpus: To teach models parallel semantics, the authors curated a dataset of 13,820 tasks synthesized via a "Critic-Refine" pipeline. This dataset explicitly filters for empirically performant algorithms using Work-Span parallel primitives. DeepSeek, Qwen, and Gemini models were then fine-tuned on this corpus to align with the rigorous semantics of the
ParlayLibC++ library. - Evolutionary Coding Agent (ECA): To bridge the "last mile" of functional correctness, ParEVO uses an agentic loop that iteratively repairs and optimizes generated code by utilizing real-world, dynamic environmental feedback from compilers, race detectors, and performance profilers.
- Massive Performance Gains: On the ParEval benchmark, ParEVO achieves an astonishing 106x average speedup (peaking at 1103x) across the suite. For complex irregular graph problems specifically, it delivers a robust 13.6x speedup—outperforming state-of-the-art commercial models and even matching expert human baselines (up to 4.1x speedup on highly-irregular kernels).
💡 Why it matters:
The transition from sequential to parallel computing is essential for modern high-performance applications but is hindered by the steep learning curve of concurrent programming. By proving that AI agents can successfully synthesize and evolve highly optimized concurrent algorithms for the hardest irregular data setting, ParEVO takes a major step toward AI-assisted High-Performance Computing (HPC) and autonomous systems engineering.
🔗 Resources:
- 🌐 Project Page: quanquancliu.com/ParEVO
- 📄 Paper: arXiv:2603.02510
- 🐙 GitHub (Code & Datasets):
WildAlg/ParEVO
Models citing this paper 2
Datasets citing this paper 0
No dataset linking this paper
Spaces citing this paper 0
No Space linking this paper
Collections including this paper 0
No Collection including this paper