arxiv:2603.02510

ParEVO: Synthesizing Code for Irregular Data: High-Performance Parallelism through Agentic Evolution

Published on Mar 3

· Submitted by

Quanquan Liu on Mar 4

Yale University

Upvote

Authors:

Abstract

ParEVO synthesizes high-performance parallel algorithms for irregular data structures using specialized LLMs and evolutionary coding to achieve significant speedups over existing methods.

AI-generated summary

The transition from sequential to parallel computing is essential for modern high-performance applications but is hindered by the steep learning curve of concurrent programming. This challenge is magnified for irregular data structures (such as sparse graphs, unbalanced trees, and non-uniform meshes) where static scheduling fails and data dependencies are unpredictable. Current Large Language Models (LLMs) often fail catastrophically on these tasks, generating code plagued by subtle race conditions, deadlocks, and sub-optimal scaling. We bridge this gap with ParEVO, a framework designed to synthesize high-performance parallel algorithms for irregular data. Our contributions include: (1) The Parlay-Instruct Corpus, a curated dataset of 13,820 tasks synthesized via a "Critic-Refine" pipeline that explicitly filters for empirically performant algorithms that effectively utilize Work-Span parallel primitives; (2) specialized DeepSeek, Qwen, and Gemini models fine-tuned to align probabilistic generation with the rigorous semantics of the ParlayLib library; and (3) an Evolutionary Coding Agent (ECA) that improves the "last mile" of correctness by iteratively repairing code using feedback from compilers, dynamic race detectors, and performance profilers. On the ParEval benchmark, ParEVO achieves an average 106x speedup (with a maximum of 1103x) across the suite, and a robust 13.6x speedup specifically on complex irregular graph problems, outperforming state-of-the-art commercial models. Furthermore, our evolutionary approach matches state-of-the-art expert human baselines, achieving up to a 4.1x speedup on specific highly-irregular kernels. Source code and datasets are available at https://github.com/WildAlg/ParEVO.

View arXiv page View PDF Project page GitHub 0 Add to collection

Community

qqggez

Paper submitter about 4 hours ago

ParEVO: Synthesizing Code for Irregular Data: High-Performance Parallelism through Agentic Evolution

TL;DR: While current LLMs excel at writing standard sequential code, they often fail catastrophically when attempting to write concurrent algorithms for complex data structures—generating code riddled with race conditions, deadlocks, and sub-optimal scaling. This paper introduces ParEVO, an end-to-end framework featuring a specialized dataset and an evolutionary agentic feedback loop that synthesizes highly optimized parallel code for irregular data.

🔑 Key Highlights:

The Irregular Data Challenge: Parallelizing code for uniform data is straightforward, but irregular data structures (e.g., sparse graphs, unbalanced trees, and non-uniform meshes) are notoriously difficult because static scheduling fails and data dependencies are unpredictable. Current frontier LLMs struggle heavily in this high-friction domain.
The Parlay-Instruct Corpus: To teach models parallel semantics, the authors curated a dataset of 13,820 tasks synthesized via a "Critic-Refine" pipeline. This dataset explicitly filters for empirically performant algorithms using Work-Span parallel primitives. DeepSeek, Qwen, and Gemini models were then fine-tuned on this corpus to align with the rigorous semantics of the ParlayLib C++ library.
Evolutionary Coding Agent (ECA): To bridge the "last mile" of functional correctness, ParEVO uses an agentic loop that iteratively repairs and optimizes generated code by utilizing real-world, dynamic environmental feedback from compilers, race detectors, and performance profilers.
Massive Performance Gains: On the ParEval benchmark, ParEVO achieves an astonishing 106x average speedup (peaking at 1103x) across the suite. For complex irregular graph problems specifically, it delivers a robust 13.6x speedup—outperforming state-of-the-art commercial models and even matching expert human baselines (up to 4.1x speedup on highly-irregular kernels).

💡 Why it matters:
The transition from sequential to parallel computing is essential for modern high-performance applications but is hindered by the steep learning curve of concurrent programming. By proving that AI agents can successfully synthesize and evolve highly optimized concurrent algorithms for the hardest irregular data setting, ParEVO takes a major step toward AI-assisted High-Performance Computing (HPC) and autonomous systems engineering.

🔗 Resources:

🌐 Project Page: quanquancliu.com/ParEVO
📄 Paper: arXiv:2603.02510
🐙 GitHub (Code & Datasets): WildAlg/ParEVO

Upload images, audio, and videos by dragging in the text input, pasting, or clicking here.

Tap or paste here to upload images

· Sign up or log in to comment

Upvote

Models citing this paper 2

Datasets citing this paper 0

No dataset linking this paper

Cite arxiv.org/abs/2603.02510 in a dataset README.md to link it from this page.

Spaces citing this paper 0

No Space linking this paper

Cite arxiv.org/abs/2603.02510 in a Space README.md to link it from this page.

Collections including this paper 0

No Collection including this paper

Add this paper to a collection to link it from this page.