Abstract
Neural scaling laws predict language model performance by framing scaling law prediction as time-series extrapolation, achieving superior accuracy compared to traditional parametric approaches.
Neural scaling laws predict how language model performance improves with increased training inputs. While aggregate metrics like validation loss can follow smooth power-law curves, individual downstream tasks exhibit diverse scaling behaviors: some improve monotonically, others plateau, and some even degrade with scale. We argue that predicting downstream performance from validation loss suffers from two limitations: averaging token-level losses obscures signal, and no simple parametric family can capture the full spectrum of scaling behaviors. To address this, we propose Neural Neural Scaling Laws (NeuNeu), a neural network that frames scaling law prediction as time-series extrapolation. NeuNeu combines temporal context from observed accuracy trajectories with token-level validation losses, learning to predict future performance without the limitations inherent in assuming a specific functional form. Trained entirely on open-source model checkpoints from HuggingFace, NeuNeu achieves 1.99% mean absolute error in predicting model accuracy on 66 downstream tasks -- a 44% reduction compared to logistic scaling laws (3.56% MAE). Furthermore, NeuNeu generalizes zero-shot to unseen model families, architectures, parameter counts, and downstream tasks. Our work suggests that predicting downstream scaling directly from data outperforms parametric alternatives.
Get this paper in your agent:
hf papers read 2601.19831 Don't have the latest CLI?
curl -LsSf https://hf.co/cli/install.sh | bash Models citing this paper 0
No model linking this paper
Datasets citing this paper 0
No dataset linking this paper
Spaces citing this paper 0
No Space linking this paper
Collections including this paper 0
No Collection including this paper