A knockout blow for LLMs?

06/09/2025

Ha ha ha. But What’s the fuss about? Apple has a new paper; it’s pretty devastating to LLMs, a powerful followup to one from many of the same authors last year. There’s actually an interesting weakness in the new argument—which I will get to below—but the overall force of the argument is undeniably powerful. So much so that LLM advocates are already partly conceding the blow while hinting at, or at least hoping for, happier futures ahead. In fairness, the paper both GaryMarcus’d and Subbarao (Rao) Kambhampati’d LLMs. On the one hand, it echoes and amplifies the training distribution argument that I have been making since 1998: neural networks of various kinds can generalize within a training distribution of data they are exposed to, but their generalizations tend to break down outside that distribution. That was the crux of my 1998 paper skewering multilayer perceptrons, the ancestors of current LLM, by showing out-of-distribution failures on simple math and sentence prediction tasks, and the crux in 2001 of my first book (The Algebraic Mind) which did the same, in a broader way, and central to my first Science paper (a 1999 experiment which demonstrated that seven-month-old infants could extrapolate in a way that then-standard neural networks could not). It was also the central motivation of my 2018 Deep Learning: Critical Appraisal, and my 2022 Deep Learning is Hitting a Wall. I singled it out here last year as the single most important — and important to understand — weakness in LLMs.

Full in-depth : Apple researchers detail the limitations of top LLMs and large reasoning models, including on classic problems like the Tower of Hanoi, which AI solved in 1957.

Tagged: AI Risks Apple Large Language Models

Subscribe Sign In

Related Posts