In 2022, Hunter Lightman joined OpenAI as a researcher and witnessed the launch of ChatGPT, one of the fastest-growing products ever. While his colleagues worked on ChatGPT, Lightman quietly focused on a team teaching OpenAI's models to solve high school math competitions. Today, that team, known as MathGen, is instrumental in OpenAI's industry-leading effort to create AI reasoning models, the core technology behind AI agents that can perform tasks on a computer like a human would.
OpenAI's models are far from perfect, but their state-of-the-art models have significantly improved on mathematical reasoning. One of OpenAI's models recently won a gold medal at the International Math Olympiad, a math competition for the world's brightest high school students. OpenAI believes these reasoning capabilities will translate to other subjects, ultimately powering general-purpose agents that the company has always dreamed of building.
ChatGPT was a happy accident, a lowkey research preview turned viral consumer business. However, OpenAI's agents are the product of a years-long, deliberate effort within the company. OpenAI CEO Sam Altman envisions a future where users can simply ask their computers for what they need, and the computers will complete all the tasks for them. These capabilities are often discussed in the AI field as agents, with tremendous upsides.
OpenAI shocked the world with the release of its first AI reasoning model, o1, in the fall of 2024. Less than a year later, the 21 foundational researchers behind that breakthrough became the most highly sought-after talent in Silicon Valley. Mark Zuckerberg recruited five of the o1 researchers to work on Meta's new superintelligence-focused unit, offering compensation packages north of $100 million.
The rise of OpenAI's reasoning models and agents is tied to a machine learning training technique known as reinforcement learning (RL). RL provides feedback to an AI model on whether its choices were correct or not in simulated environments. RL has been used for decades, such as in 2016 when Google DeepMind's AlphaGo, created using RL, gained global attention after beating a world champion in the board game Go.
OpenAI pioneered its first large language model in the GPT series in 2018, pretrained on massive amounts of internet data and large clusters of GPUs. GPT models excelled at text processing, eventually leading to ChatGPT, but struggled with basic math. It took until 2023 for OpenAI to achieve a breakthrough, initially dubbed "Q*" and then "Strawberry," by combining LLMs, RL, and a technique called test-time computation.
This allowed OpenAI to introduce a new approach called "chain-of-thought" (CoT), which improved AI's performance on math questions the models hadn't seen before. OpenAI uniquely combined these techniques to create Strawberry, which directly led to the development of o1. OpenAI quickly identified that the planning and fact-checking abilities of AI reasoning models could be useful to power AI agents.
Shortly after the 2023 Strawberry breakthrough, OpenAI spun up an "Agents" team led by OpenAI researcher Daniel Selsam to make further progress on this new paradigm. Although the team was called "Agents," OpenAI didn't initially differentiate between reasoning models and agents as we think of them today. The company just wanted to make AI systems capable of completing complex tasks.
OpenAI would have to divert precious resources, mainly talent and GPUs, to create o1. Throughout OpenAI's history, researchers have had to negotiate with company leaders to obtain resources; demonstrating breakthroughs was a surefire way to secure them. Some former employees say that the startup's mission to develop AGI was the key factor in achieving breakthroughs around AI reasoning models.
By late 2024, several leading AI labs started seeing diminishing returns on models created through traditional pretraining scaling. Today, much of the AI field's momentum comes from advances in reasoning models. The goal of AI research is to recreate human intelligence with computers. Since the launch of o1, ChatGPT's UX has been filled with more human-sounding features such as "thinking" and "reasoning."
OpenAI's researchers note that people may disagree with their nomenclature or definitions of reasoning, but they argue it's less important than the capabilities of their models. Other AI researchers tend to agree. AI reasoning models are not well understood today, and more research is needed. It may be too early to confidently claim what exactly is going on inside them.
The AI agents on the market today work best for well-defined, verifiable domains such as coding. OpenAI's Codex agent aims to help software engineers offload simple coding tasks. However, general-purpose AI agents like OpenAI's ChatGPT Agent and Perplexity's Comet struggle with many of the complex, subjective tasks people want to automate.
Researchers must first figure out how to better train the underlying models to complete tasks that are more subjective. Noam Brown, an OpenAI researcher who helped create the IMO model and o1, told TechCrunch that OpenAI has new general-purpose RL techniques which allow them to teach AI models skills that aren't easily verified. This was how the company built the model which achieved a gold medal at IMO.
These techniques may help OpenAI's models become more performant, gains that could show up in the company's upcoming GPT-5 model. OpenAI hopes to assert its dominance over competitors with the launch of GPT-5, ideally offering the best AI model to power agents for developers and consumers. The company also wants to make its products simpler to use, building AI systems that understand when to call up certain tools and how long to reason for.
While OpenAI undoubtedly led the AI industry a few years ago, the company now faces a tranche of worthy opponents. The question is no longer just whether OpenAI can deliver its agentic future, but can the company do so before Google, Anthropic, xAI, or Meta beat them to it?