Start your day with intelligence. Get The OODA Daily Pulse.
Scaling laws were supposed to be, well, laws. Mathematical laws. Pretty much the entire GenAI frenzy has been based on the idea that with reasonable precision you could predict performance from amount of data, number of parameters, and amount of compute. As recently as a few weeks ago, Sam Altman was extolling scaling laws as if they were a religion, in an interview with the CEO of Y Combinator: “When we started the core beliefs were that deep learning works and it gets better with scale… predictably… A religious level belief … was…. that that wasnt’t gotten to stop. .. Then we got the scaling results … At some point you have to just look at the scaling laws and say we’re going to keep doing this… There was something really fundamental going on. We had discovered a new square in the periodic table.” But things have been changing rapidly, and the great scaling retrenchment has already begun. Over the last few weeks, much of the field has been quietly acknowledging that recent (not yet public) large-scale models aren’t as powerful as the putative laws were predicting. The new version is that there is not one scaling law, but three: scaling with how long you train a model (which isn’t really holding anymore), scaling with how long you post-train a model, and scaling with how long you let a given model wrestle with a given problem (or what Satya Nadella called scaling with “inference time compute”).