YAML Metadata Warning:empty or missing yaml metadata in repo card
Check out the documentation for more information.
EML Trainability Study: Can We Turn Theoretical Universality Into Practical Training?
Overview
This repository contains an empirical study of whether the EML operator eml(x,y) = exp(x) - ln(y) from arXiv:2603.21852 can be made practically trainable for symbolic regression via gradient descent.
The Theoretical Discovery
The EML paper proved that every elementary mathematical function — addition, multiplication, trigonometry, logarithms, π, e, etc. — can be generated from just one binary operator and the constant 1:
eml(x, y) = exp(x) − ln(y)
This is analogous to how the NAND gate generates all Boolean logic. The grammar is trivially simple: S → 1 | eml(S, S).
The Practical Problem
While mathematically universal, this crashes in code. Stacking exponentials 3-4 levels deep in floating-point arithmetic causes numbers to explode to infinity or collapse to zero. The paper itself reports:
- Depth 1-2: 100% recovery from random initialization
- Depth 3-4: ~25% recovery
- Depth 5+: <1% recovery
- Depth 6: 0% in 448 attempts
Yet paradoxically, when initialized near the correct solution, recovery is 100% even at depth 5-6. The basins of attraction exist — they're just needle-in-a-haystack from random init.
Research Questions
- Which numerical stability techniques most improve deep EML tree training?
- What is the maximum recoverable tree depth with enhanced methods?
- Can EML-based SR recover real physics equations (Feynman benchmark)?
Methods
Stability Techniques Tested
| Method | Description | Source |
|---|---|---|
| Soft routing | Standard softmax input selection (baseline) | EML paper §4.3 |
| Gumbel-hard | Straight-through Gumbel-softmax — hard selection in forward, soft gradients in backward | Jang et al. 2017 |
| Bounded | tanh(output/R) * R normalization after each node |
Inspired by NALU (Trask 2018) |
| Combined | Saturating linear: `x / (1 + | x |
Key Innovations
Hard routing prevents intermediate explosion: Soft routing creates weighted mixtures of {1, x, f} that can produce arbitrary intermediate values. Hard selection ensures only one input is chosen per EML node, preventing the "exp of a mixture" problem.
Multi-loss training: MSE + correlation loss (captures function shape regardless of scale) + entropy regularization (encourages discrete routing decisions).
Temperature annealing: Start with high temperature (smooth, exploratory) and anneal to near-zero (hard, discrete) over training.
Multi-restart search: Since basins are narrow, we run 20-30 random initializations per configuration and report best + success rates.
Architecture: The Master Formula
Following the paper's §4.3, we implement the EML master formula as a full binary tree:
- Leaf nodes select from
{1, x₁, ..., xₖ}(constant and input variables) - Internal nodes select from
{1, x₁, ..., xₖ, f_left, f_right}(also including child outputs) - Each selection is parameterized by learnable logits passed through Gumbel-softmax
- Output affine transform
a * eml(left, right) + bper node
Total parameters: O(5 × 2ⁿ) for depth n (as stated in the paper).
Experimental Design
Phase 1: Known EML Identities
Test recovery of functions with known EML decompositions:
| Function | EML Depth | EML Expression |
|---|---|---|
exp(x) |
1 | eml(x, 1) |
e (constant) |
1 | eml(1, 1) |
ln(x) |
3 | eml(1, eml(eml(1,x), 1)) |
-x |
2 | Via composition |
1/x |
3 | Via composition |
x + y |
4 | Via exp/ln identities |
x × y |
4+ | Via exp/ln identities |
x² |
4 | exp(2·ln(x)) |
√x |
4 | exp(0.5·ln(x)) |
sin(x) |
5+ | Requires complex intermediates |
Phase 2: Feynman Physics Equations
A curated set of physics equations from the SRSD-Feynman benchmark:
- Gaussian distribution:
exp(-θ²/2)/√(2π) - Euclidean distance:
√((x₂-x₁)² + (y₂-y₁)²) - Inverse square law:
F = q₁q₂/(4πε₀r²) - Relativistic mass:
m₀/√(1-v²/c²) - Harmonic oscillator:
E = ½kx² - And more...
Phase 3: Depth Scaling Analysis
Systematic measurement of recovery rate vs. depth using EML-native targets.
Key Literature References
| Topic | Paper | Key Insight |
|---|---|---|
| EML operator | 2603.21852 | Universal primitive for elementary functions |
| Gumbel-softmax | Jang et al. 2017 | Differentiable discrete selection |
| NALU | 1808.00508 | Stable exp-log arithmetic cells |
| NAU | 2001.05016 | Fixing NALU's gradient issues |
| Gradient clipping | 1211.5063 | Controlling exploding gradients |
| BFloat16 training | 2010.06192 | Kahan summation for precision |
| AutoNumerics-Zero | 2312.08472 | Range reduction for transcendentals |
| Numerical stability | 2501.04697 | Grokking at the edge of stability |
| Tropical geometry | 2505.17190 | Max-plus limit of log-sum-exp |
| AI Feynman | Udrescu & Tegmark 2020 | Physics equations benchmark |
| SRSD | 2206.10540 | Feynman benchmark with proper data |
| PySR | Cranmer 2023 | Evolutionary symbolic regression |
| TPSR | 2303.06833 | Transformer + MCTS for SR |
Preliminary Results (CPU validation)
From our CPU sandbox testing:
| Function | Depth | Best R² | Method | Notes |
|---|---|---|---|---|
exp(x) |
1 | 0.9999 | Gumbel-hard | ✅ Trivially recovered |
e (const) |
1 | 0.9999 | Gumbel-hard | ✅ Correct: eml(1,1) |
ln(x) |
3 | -0.08 | All methods | ❌ All 10 restarts fail |
x² |
4 | TBD | - | Awaiting GPU results |
Key Observation
The depth-3 barrier is real and severe. Even with hard routing (Gumbel-softmax), bounded normalization, curriculum learning, and multi-loss training, recovering ln(x) from random initialization fails consistently. This aligns with the paper's finding of ~25% success at depth 3-4 and suggests that:
- The loss landscape at depth 3+ has exponentially many local minima relative to the one correct basin
- Better optimization (second-order methods, population-based search) may help
- Informed initialization (starting near known decompositions) is likely required for practical use
GPU Experiment Status
🔄 Running: Full experiment on T4 GPU with 3 phases and 4 stability methods.
Job: 69e7837acd8c002f31e00d75
Results will be uploaded to the results/ folder upon completion.
How to Reproduce
# Install dependencies
pip install torch numpy huggingface_hub
# Run the full experiment
python code/eml_experiment.py
Citation
If you use this work, please cite the original EML paper:
@article{eml2026,
title={All elementary functions from a single operator},
author={...},
journal={arXiv preprint arXiv:2603.21852},
year={2026}
}
License
MIT