Start your day with intelligence. Get The OODA Daily Pulse.
Google is rolling out its new text-to-speech model based on Gemini 3.1 Flash. The company says it’s the most natural and expressive voice output it has shipped to date. The big new feature is audio tags—simple text commands that let developers control the style, tempo, tone, and accent of the generated speech. The model supports over 70 languages and can handle multi-speaker dialogs. On the Artificial Analysis ranking list, the model hits an Elo rating of 1,211 and stands out for its quality-to-price ratio. It beats Elevenlabs v3 in overall quality and sits just behind Inworld 1.5 Max.