Start your day with intelligence. Get The OODA Daily Pulse.
Google has launched Gemini 2.5 Flash, a major upgrade to its AI lineup that gives businesses and developers unprecedented control over how much “thinking” their AI performs. The new model, released today in preview through Google AI Studio and Vertex AI, represents a strategic effort to deliver improved reasoning capabilities while maintaining competitive pricing in the increasingly crowded AI market. The model introduces what Google calls a “thinking budget” — a mechanism that allows developers to specify how much computational power should be allocated to reasoning through complex problems before generating a response. This approach aims to address a fundamental tension in today’s AI marketplace: more sophisticated reasoning typically comes at the cost of higher latency and pricing. “We know cost and latency matter for a number of developer use cases, and so we want to offer developers the flexibility to adapt the amount of the thinking the model does, depending on their needs,” said Tulsee Doshi, Product Director for Gemini Models at Google DeepMind, in an exclusive interview with VentureBeat. This flexibility reveals Google’s pragmatic approach to AI deployment as the technology increasingly becomes embedded in business applications where cost predictability is essential. By allowing the thinking capability to be turned on or off, Google has created what it calls its “first fully hybrid reasoning model.”
Full report : Google’s Gemini 2.5 Flash introduces ‘thinking budgets’ that cut AI costs by 600% when turned down.