YAML Metadata Warning:empty or missing yaml metadata in repo card

Check out the documentation for more information.

EML Trainability Study: Can We Turn Theoretical Universality Into Practical Training?

Overview

This repository contains an empirical study of whether the EML operator eml(x,y) = exp(x) - ln(y) from arXiv:2603.21852 can be made practically trainable for symbolic regression via gradient descent.

The Theoretical Discovery

The EML paper proved that every elementary mathematical function — addition, multiplication, trigonometry, logarithms, π, e, etc. — can be generated from just one binary operator and the constant 1:

eml(x, y) = exp(x) − ln(y)

This is analogous to how the NAND gate generates all Boolean logic. The grammar is trivially simple: S → 1 | eml(S, S).

The Practical Problem

While mathematically universal, this crashes in code. Stacking exponentials 3-4 levels deep in floating-point arithmetic causes numbers to explode to infinity or collapse to zero. The paper itself reports:

Depth 1-2: 100% recovery from random initialization
Depth 3-4: ~25% recovery
Depth 5+: <1% recovery
Depth 6: 0% in 448 attempts

Yet paradoxically, when initialized near the correct solution, recovery is 100% even at depth 5-6. The basins of attraction exist — they're just needle-in-a-haystack from random init.

Research Questions

Which numerical stability techniques most improve deep EML tree training?
What is the maximum recoverable tree depth with enhanced methods?
Can EML-based SR recover real physics equations (Feynman benchmark)?

Methods

Stability Techniques Tested

Method	Description	Source
Soft routing	Standard softmax input selection (baseline)	EML paper §4.3
Gumbel-hard	Straight-through Gumbel-softmax — hard selection in forward, soft gradients in backward	Jang et al. 2017
Bounded	`tanh(output/R) * R` normalization after each node	Inspired by NALU (Trask 2018)
Combined	Saturating linear: `x / (1 +	x

Key Innovations

Hard routing prevents intermediate explosion: Soft routing creates weighted mixtures of {1, x, f} that can produce arbitrary intermediate values. Hard selection ensures only one input is chosen per EML node, preventing the "exp of a mixture" problem.
Multi-loss training: MSE + correlation loss (captures function shape regardless of scale) + entropy regularization (encourages discrete routing decisions).
Temperature annealing: Start with high temperature (smooth, exploratory) and anneal to near-zero (hard, discrete) over training.
Multi-restart search: Since basins are narrow, we run 20-30 random initializations per configuration and report best + success rates.

Architecture: The Master Formula

Following the paper's §4.3, we implement the EML master formula as a full binary tree:

Leaf nodes select from {1, x₁, ..., xₖ} (constant and input variables)
Internal nodes select from {1, x₁, ..., xₖ, f_left, f_right} (also including child outputs)
Each selection is parameterized by learnable logits passed through Gumbel-softmax
Output affine transform a * eml(left, right) + b per node

Total parameters: O(5 × 2ⁿ) for depth n (as stated in the paper).

Experimental Design

Phase 1: Known EML Identities

Test recovery of functions with known EML decompositions:

Function	EML Depth	EML Expression
`exp(x)`	1	`eml(x, 1)`
`e` (constant)	1	`eml(1, 1)`
`ln(x)`	3	`eml(1, eml(eml(1,x), 1))`
`-x`	2	Via composition
`1/x`	3	Via composition
`x + y`	4	Via exp/ln identities
`x × y`	4+	Via exp/ln identities
`x²`	4	`exp(2·ln(x))`
`√x`	4	`exp(0.5·ln(x))`
`sin(x)`	5+	Requires complex intermediates

Phase 2: Feynman Physics Equations

A curated set of physics equations from the SRSD-Feynman benchmark:

Gaussian distribution: exp(-θ²/2)/√(2π)
Euclidean distance: √((x₂-x₁)² + (y₂-y₁)²)
Inverse square law: F = q₁q₂/(4πε₀r²)
Relativistic mass: m₀/√(1-v²/c²)
Harmonic oscillator: E = ½kx²
And more...

Phase 3: Depth Scaling Analysis

Systematic measurement of recovery rate vs. depth using EML-native targets.

Key Literature References

Topic	Paper	Key Insight
EML operator	2603.21852	Universal primitive for elementary functions
Gumbel-softmax	Jang et al. 2017	Differentiable discrete selection
NALU	1808.00508	Stable exp-log arithmetic cells
NAU	2001.05016	Fixing NALU's gradient issues
Gradient clipping	1211.5063	Controlling exploding gradients
BFloat16 training	2010.06192	Kahan summation for precision
AutoNumerics-Zero	2312.08472	Range reduction for transcendentals
Numerical stability	2501.04697	Grokking at the edge of stability
Tropical geometry	2505.17190	Max-plus limit of log-sum-exp
AI Feynman	Udrescu & Tegmark 2020	Physics equations benchmark
SRSD	2206.10540	Feynman benchmark with proper data
PySR	Cranmer 2023	Evolutionary symbolic regression
TPSR	2303.06833	Transformer + MCTS for SR

Preliminary Results (CPU validation)

From our CPU sandbox testing:

Function	Depth	Best R²	Method	Notes
`exp(x)`	1	0.9999	Gumbel-hard	✅ Trivially recovered
`e` (const)	1	0.9999	Gumbel-hard	✅ Correct: `eml(1,1)`
`ln(x)`	3	-0.08	All methods	❌ All 10 restarts fail
`x²`	4	TBD	-	Awaiting GPU results

Key Observation

The depth-3 barrier is real and severe. Even with hard routing (Gumbel-softmax), bounded normalization, curriculum learning, and multi-loss training, recovering ln(x) from random initialization fails consistently. This aligns with the paper's finding of ~25% success at depth 3-4 and suggests that:

The loss landscape at depth 3+ has exponentially many local minima relative to the one correct basin
Better optimization (second-order methods, population-based search) may help
Informed initialization (starting near known decompositions) is likely required for practical use

GPU Experiment Status

🔄 Running: Full experiment on T4 GPU with 3 phases and 4 stability methods. Job: 69e7837acd8c002f31e00d75

Results will be uploaded to the results/ folder upon completion.

How to Reproduce

# Install dependencies
pip install torch numpy huggingface_hub

# Run the full experiment
python code/eml_experiment.py

Citation

If you use this work, please cite the original EML paper:

@article{eml2026,
  title={All elementary functions from a single operator},
  author={...},
  journal={arXiv preprint arXiv:2603.21852},
  year={2026}
}

License

MIT

Downloads last month: -; Downloads are not tracked for this model. How to track

Inference Providers NEW

This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Papers for pedromoreira22/eml-trainability-study

AutoNumerics-Zero: Automated Discovery of State-of-the-Art Mathematical Functions

Paper • 2312.08472 • Published Dec 13, 2023 • 2

Transformer-based Planning for Symbolic Regression

Paper • 2303.06833 • Published Mar 13, 2023