The Inference Shift

05/11/2026

The fundamental driver of the ongoing surge in semiconductor stocks is, of course, AI, particularly the realization that agents are going to need a lot of compute. What Cerebras represents, however, is something broader: while the compute story for AI has been largely about GPUs, particularly from Nvidia, the future is going to look increasingly heterogeneous. The GPU Era
The story of how Graphics Processing Units became the center of AI is a well-trodden one, but in brief:

Just as drawing pixels on a computer screen was a parallel process, which meant there was a direct connection between the number of processing units and graphics speed, making AI-related calculations was a parallel process, which meant there was a direct connection between the number of processing units and calculation speed.
Nvidia enabled this dual-usage by making its graphics processors programmable, and created an entire software ecosystem called CUDA to make this programming accessible.
The big difference between graphics and AI has been the size of the problem being solved — models are a lot bigger than video game textures — which has led to a dramatic expansion in high-bandwidth memory (HBM) per GPU, and dramatic innovations in terms of chip-to-chip networking to allow multiple chips to work together as one addressable system. Nvidia has been the leader in both.

Full analysis : Agentic inference is set to be different than today’s inference, and will change compute infrastructure because speed won’t matter when humans aren’t involved.

Tagged: Agentic AI AI Chips AI Training

Subscribe Sign In

Related Posts