Start your day with intelligence. Get The OODA Daily Pulse.

Home > Briefs > Technology > Notes on OpenAI’s new o1 chain-of-thought models

Notes on OpenAI’s new o1 chain-of-thought models

OpenAI released two major new preview models today: o1-preview and o1-mini (that mini one is also a preview, despite the name)—previously rumored as having the codename “strawberry”. There’s a lot to understand about these models—they’re not as simple as the next step up from GPT-4o, instead introducing some major trade-offs in terms of cost and performance in exchange for improved “reasoning” capabilities.

  • Trained for chain of thought
  • Low-level details from the API documentation
  • Hidden reasoning tokens
  • Examples
  • What’s new in all of this

Trained for chain of thought # OpenAI’s elevator pitch is a good starting point: We’ve developed a new series of AI models designed to spend more time thinking before they respond. One way to think about these new models is as a specialized extension of the chain of thought prompting pattern—the “think step by step” trick that we’ve been exploring as a a community for a couple of years now, first introduced in the paper Large Language Models are Zero-Shot Reasoners in May 2022. OpenAI’s article Learning to Reason with LLMs explains how the new models were trained: Our large-scale reinforcement learning algorithm teaches the model how to think productively using its chain of thought in a highly data-efficient training process. We have found that the performance of o1 consistently improves with more reinforcement learning (train-time compute) and with more time spent thinking (test-time compute). The constraints on scaling this approach differ substantially from those of LLM pretraining, and we are continuing to investigate them.

Full opinion : OpenAI’s o1 models aren’t as simple as the next step up from GPT-4o as they introduce major cost and performance trade-offs in exchange for improved “reasoning.”