Scaling Laws

Empirically observed relationships showing that model performance improves predictably as you increase compute, data, and parameter count.

Scaling laws are empirically observed relationships showing that machine intelligence model performance improves predictably — following a power law — as you increase compute, data, and parameter count. More training compute buys more capability, and you can model the curve in advance.

This is the thesis behind the current AI arms race. The landmark 2020 paper from Kaplan et al. at OpenAI plotted these curves across seven orders of magnitude with no visible plateau. If the curves hold, the path to better AI is straightforward: spend more on bigger training runs, and the results compound. Labs have largely acted on this belief, with training runs now costing hundreds of millions of dollars.

The 2022 Chinchilla paper (Hoffmann et al.) complicated the picture usefully. It showed the original scaling approach was wasteful — labs were making models too large relative to their training data. Compute-optimal training scales data and parameters together. The result: smaller, better-trained models can match or beat oversized ones at a fraction of the cost. That's why model size stopped being a reliable proxy for quality.

The unresolved question is whether the curves continue to hold, or whether we're approaching diminishing returns that require qualitatively different approaches. If they flatten, the game shifts from "who spends the most" to "who engineers the best" — where data quality, evaluation rigor, and domain expertise matter more than raw compute budget.

Either way, you don't need to pick sides. Build on the foundation models that exist today and stay architecture-flexible enough to adopt what comes next.

Related Terms