arxiv:2603.24624

ReSyn: A Generalized Recursive Regular Expression Synthesis Framework

Published on Jun 13

· Submitted by

Seongmin Kim on Jun 19

Upvote

Authors:

Seongmin Kim ,

Abstract

A divide-and-conquer framework named ReSyn enhances regex synthesis accuracy by decomposing complex problems, combined with a parameter-efficient synthesizer called Set2Regex that handles example permutation invariance.

Generated by Qwen/Qwen2.5-Coder-32B-Instruct

Existing Programming-By-Example (PBE) systems often rely on simplified benchmarks that fail to capture the high structural complexity of real-world regexes, such as deeper nesting and frequent use of union operations. To overcome the resulting performance drop, we propose ReSyn, a synthesizer-agnostic divide-and-conquer framework that decomposes complex synthesis problem into manageable sub-problems. We also introduce Set2Regex, a parameter-efficient synthesizer capturing the permutation invariance of examples. Experimental results demonstrate that ReSyn significantly boosts accuracy across various synthesizers, and its combination with Set2Regex establishes a new state-of-the-art on challenging real-world benchmark. The complete source code, datasets, and pre-trained model checkpoints are publicly available at https://github.com/mrseongminkim/ReSyn.

View arXiv page View PDF GitHub 0 Add to collection

Community

mrseongminkim

Paper author Paper submitter 11 days ago

We've released everything to use and build on ReSyn from the Hub:

📚 Dataset: https://huggingface.co/datasets/mrseongminkim/ReSyn
🤖 Pre-trained components (loadable via from_pretrained): Set2Regex · Router · Partitioner · Segmenter
🤖 Prax baseline: ReSyn-byt5-small
💻 Code: https://github.com/mrseongminkim/ReSyn

librarian-bot

10 days ago

This is an automated message from the Librarian Bot. I found the following papers similar to this paper.

The following papers were recommended by the Semantic Scholar API

Please give a thumbs up to this comment if you found it helpful!

If you want recommendations for any Paper on Hugging Face checkout this Space

You can directly ask Librarian Bot for paper recommendations by tagging it in a comment: @librarian-bot recommend

jhegedus

10 days ago

•

edited about 23 hours ago

noahml

10 days ago

Neat approach to regex synthesis. PBE tools usually struggle once things get nested or rely heavily on unions, so the divide-and-conquer strategy here sounds like a solid way to handle that complexity.

I'm curious how the Set2Regex component handles cases where the provided examples are ambiguous or don't fully define the intended pattern. Does the permutation invariance help prune the search space significantly when the input set is small?

I made a podcast on it with ResearchPod, it makes it easy to get the key concepts on the go:
https://researchpod.app/episode/0eb2f81d-649c-4cf5-bd64-d1823b2bc89e

Upload images, audio, and videos by dragging in the text input, pasting, or clicking here.

Tap or paste here to upload images

· Sign up or log in to comment

Upvote

Get this paper in your agent:

hf papers read 2603.24624

Don't have the latest CLI?

curl -LsSf https://hf.co/cli/install.sh | bash

Models citing this paper 5

Browse 5 models citing this paper

Datasets citing this paper 1

Spaces citing this paper 0

No Space linking this paper

Cite arxiv.org/abs/2603.24624 in a Space README.md to link it from this page.

Collections including this paper 0

No Collection including this paper

Add this paper to a collection to link it from this page.