YAML Metadata Warning:empty or missing yaml metadata in repo card

Check out the documentation for more information.

EML Trainability Study: Can We Turn Theoretical Universality Into Practical Training?

Overview

This repository contains an empirical study of whether the EML operator eml(x,y) = exp(x) - ln(y) from arXiv:2603.21852 can be made practically trainable for symbolic regression via gradient descent.

The Theoretical Discovery

The EML paper proved that every elementary mathematical function — addition, multiplication, trigonometry, logarithms, π, e, etc. — can be generated from just one binary operator and the constant 1:

eml(x, y) = exp(x) − ln(y)

This is analogous to how the NAND gate generates all Boolean logic. The grammar is trivially simple: S → 1 | eml(S, S).

The Practical Problem

While mathematically universal, this crashes in code. Stacking exponentials 3-4 levels deep in floating-point arithmetic causes numbers to explode to infinity or collapse to zero. The paper itself reports:

  • Depth 1-2: 100% recovery from random initialization
  • Depth 3-4: ~25% recovery
  • Depth 5+: <1% recovery
  • Depth 6: 0% in 448 attempts

Yet paradoxically, when initialized near the correct solution, recovery is 100% even at depth 5-6. The basins of attraction exist — they're just needle-in-a-haystack from random init.

Research Questions

  1. Which numerical stability techniques most improve deep EML tree training?
  2. What is the maximum recoverable tree depth with enhanced methods?
  3. Can EML-based SR recover real physics equations (Feynman benchmark)?

Methods

Stability Techniques Tested

Method Description Source
Soft routing Standard softmax input selection (baseline) EML paper §4.3
Gumbel-hard Straight-through Gumbel-softmax — hard selection in forward, soft gradients in backward Jang et al. 2017
Bounded tanh(output/R) * R normalization after each node Inspired by NALU (Trask 2018)
Combined Saturating linear: `x / (1 + x

Key Innovations

  1. Hard routing prevents intermediate explosion: Soft routing creates weighted mixtures of {1, x, f} that can produce arbitrary intermediate values. Hard selection ensures only one input is chosen per EML node, preventing the "exp of a mixture" problem.

  2. Multi-loss training: MSE + correlation loss (captures function shape regardless of scale) + entropy regularization (encourages discrete routing decisions).

  3. Temperature annealing: Start with high temperature (smooth, exploratory) and anneal to near-zero (hard, discrete) over training.

  4. Multi-restart search: Since basins are narrow, we run 20-30 random initializations per configuration and report best + success rates.

Architecture: The Master Formula

Following the paper's §4.3, we implement the EML master formula as a full binary tree:

  • Leaf nodes select from {1, x₁, ..., xₖ} (constant and input variables)
  • Internal nodes select from {1, x₁, ..., xₖ, f_left, f_right} (also including child outputs)
  • Each selection is parameterized by learnable logits passed through Gumbel-softmax
  • Output affine transform a * eml(left, right) + b per node

Total parameters: O(5 × 2ⁿ) for depth n (as stated in the paper).

Experimental Design

Phase 1: Known EML Identities

Test recovery of functions with known EML decompositions:

Function EML Depth EML Expression
exp(x) 1 eml(x, 1)
e (constant) 1 eml(1, 1)
ln(x) 3 eml(1, eml(eml(1,x), 1))
-x 2 Via composition
1/x 3 Via composition
x + y 4 Via exp/ln identities
x × y 4+ Via exp/ln identities
4 exp(2·ln(x))
√x 4 exp(0.5·ln(x))
sin(x) 5+ Requires complex intermediates

Phase 2: Feynman Physics Equations

A curated set of physics equations from the SRSD-Feynman benchmark:

  • Gaussian distribution: exp(-θ²/2)/√(2π)
  • Euclidean distance: √((x₂-x₁)² + (y₂-y₁)²)
  • Inverse square law: F = q₁q₂/(4πε₀r²)
  • Relativistic mass: m₀/√(1-v²/c²)
  • Harmonic oscillator: E = ½kx²
  • And more...

Phase 3: Depth Scaling Analysis

Systematic measurement of recovery rate vs. depth using EML-native targets.

Key Literature References

Topic Paper Key Insight
EML operator 2603.21852 Universal primitive for elementary functions
Gumbel-softmax Jang et al. 2017 Differentiable discrete selection
NALU 1808.00508 Stable exp-log arithmetic cells
NAU 2001.05016 Fixing NALU's gradient issues
Gradient clipping 1211.5063 Controlling exploding gradients
BFloat16 training 2010.06192 Kahan summation for precision
AutoNumerics-Zero 2312.08472 Range reduction for transcendentals
Numerical stability 2501.04697 Grokking at the edge of stability
Tropical geometry 2505.17190 Max-plus limit of log-sum-exp
AI Feynman Udrescu & Tegmark 2020 Physics equations benchmark
SRSD 2206.10540 Feynman benchmark with proper data
PySR Cranmer 2023 Evolutionary symbolic regression
TPSR 2303.06833 Transformer + MCTS for SR

Preliminary Results (CPU validation)

From our CPU sandbox testing:

Function Depth Best R² Method Notes
exp(x) 1 0.9999 Gumbel-hard ✅ Trivially recovered
e (const) 1 0.9999 Gumbel-hard ✅ Correct: eml(1,1)
ln(x) 3 -0.08 All methods ❌ All 10 restarts fail
4 TBD - Awaiting GPU results

Key Observation

The depth-3 barrier is real and severe. Even with hard routing (Gumbel-softmax), bounded normalization, curriculum learning, and multi-loss training, recovering ln(x) from random initialization fails consistently. This aligns with the paper's finding of ~25% success at depth 3-4 and suggests that:

  1. The loss landscape at depth 3+ has exponentially many local minima relative to the one correct basin
  2. Better optimization (second-order methods, population-based search) may help
  3. Informed initialization (starting near known decompositions) is likely required for practical use

GPU Experiment Status

🔄 Running: Full experiment on T4 GPU with 3 phases and 4 stability methods. Job: 69e7837acd8c002f31e00d75

Results will be uploaded to the results/ folder upon completion.

How to Reproduce

# Install dependencies
pip install torch numpy huggingface_hub

# Run the full experiment
python code/eml_experiment.py

Citation

If you use this work, please cite the original EML paper:

@article{eml2026,
  title={All elementary functions from a single operator},
  author={...},
  journal={arXiv preprint arXiv:2603.21852},
  year={2026}
}

License

MIT

Downloads last month

-

Downloads are not tracked for this model. How to track
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Papers for pedromoreira22/eml-trainability-study