arxiv:2606.29331

Sample Complexity of Scientific Discovery: PAC Learnability of Compositional Function Trees

Published on Jun 28

Authors:

Abstract

Symbolic regression's computational complexity is analyzed through PAC learning theory, showing that generalization risk remains manageable under certain Lipschitz conditions and arity bounds, with empirical results supporting the theoretical predictions.

Generated by Qwen/Qwen2.5-Coder-32B-Instruct

Scientific discovery via symbolic regression is often viewed as statistically and computationally intractable because the hypothesis space of expressions grows combinatorially with depth. This paper revisits the statistical side through the lens of PAC learning, focusing on compositional function trees built from a finite vocabulary of smooth operators (e.g., {+,times,sin,exp} and affine maps). We prove that the relevant generalization quantity, Rademacher complexity, hence the excess risk, does not necessarily blow up exponentially with the number of distinct symbolic structures, but is controlled by (i) the depth d and (ii) the Lipschitz constants of the base operators along the composed computation graph. Concretely, under mild Lipschitz conditions on operators and bounded affine leaves, a finite-union bound over a vocabulary of size K=|H_{base}| together with Maurer-type vector contraction yields R_n(H_{comp}^{d}) leq (Kb2L)^{d-1}R_n(H_{comp}^{1}) with arity bound b; corresponding high-probability risk bounds scale as O(L^{d}/n) when K,b=O(1) and R_n(H_{comp}^{1})=O(n^{-1/2}). We complement the theory with a modular codebase that trains differentiable operator trees (not MLPs) on synthetic "physics-like" targets of controlled depth and shows that the empirical generalization gap correlates positively with the predicted complexity term (L^{d})/n.

View arXiv page View PDF Add to collection

Community

Upload images, audio, and videos by dragging in the text input, pasting, or clicking here.

Tap or paste here to upload images

· Sign up or log in to comment

Upvote

Get this paper in your agent:

hf papers read 2606.29331

Don't have the latest CLI?

curl -LsSf https://hf.co/cli/install.sh | bash

Models citing this paper 0

No model linking this paper

Cite arxiv.org/abs/2606.29331 in a model README.md to link it from this page.

Datasets citing this paper 0

No dataset linking this paper

Cite arxiv.org/abs/2606.29331 in a dataset README.md to link it from this page.

Spaces citing this paper 0

No Space linking this paper

Cite arxiv.org/abs/2606.29331 in a Space README.md to link it from this page.

Collections including this paper 0

No Collection including this paper

Add this paper to a collection to link it from this page.