The MAD Podcast with Matt Turck•October 16, 2025

How GPT-5 Thinks — OpenAI VP of Research Jerry Tworek

A deep dive into how OpenAI's VP of Research Jerry Tworek thinks about AI reasoning, reinforcement learning, and the path to AGI through pre-training and scaled reinforcement learning techniques.

AI & Machine Learning

0:00/0:00

Timestamps are as accurate as they can be but may be slightly off. We encourage you to listen to the full context.

0:00/0:00

Podcast Summary

In this enlightening episode, Matt Turk interviews Jared Forrick, VP of Research at OpenAI and member of the Metis List of the world's top AI researchers. The conversation provides a rare behind-the-scenes look at OpenAI's culture and reveals how modern AI systems actually work. (00:27) Jerry explains the fundamental process of reasoning in AI models, describing it as "getting to an answer that you don't yet know" rather than simply retrieving known information. (02:00)

The discussion traces the evolution from O1 to O3 to GPT-5, with Jerry characterizing O1 as primarily a technology demonstration for solving puzzles, while O3 represented "something like a tectonic shift in the trajectory of AI." (09:00) A major theme emerges around OpenAI's research philosophy: focusing on just three or four core projects with intense collaboration, where "everyone knows everything" within the research organization of nearly 600 people. (27:30)

Main themes: The mechanics of AI reasoning, the combination of pre-training and reinforcement learning as the path to AGI, OpenAI's unique collaborative culture, and the evolution from technology demonstrations to practical applications

Speakers

Jared Forrick

VP of Research at OpenAI and a member of the Metis List of the world's top AI researchers. Jerry led the effort that resulted in the world's first reasoning model, O1. His background spans mathematics (University of Warsaw), trading at JPMorgan and hedge funds in London and Amsterdam, before joining OpenAI in 2019 during its early nonprofit era. He initially worked on robotics projects, including the famous Rubik's cube-solving robotic hand, before transitioning to lead OpenAI's reinforcement learning research program.

Matt Turk

Host from Firstmark Capital who conducts deep technical conversations with leading AI researchers and entrepreneurs. In this episode, he skillfully guides the conversation from high-level concepts to technical details, making complex AI topics accessible to a broader audience.

Key Takeaways

Focus Creates Breakthrough Results

OpenAI's approach to research prioritization reveals a crucial insight about achieving breakthroughs. (24:40) Rather than pursuing a broad portfolio of projects, Jerry explains that OpenAI deliberately focuses on just three or four core projects total, putting "a lot of people working together on the same large scale, large ambition project." This concentrated effort approach allows for deeper collaboration and more significant advances, as resources aren't scattered across numerous smaller initiatives. The strategy requires researchers to work within constrained parameters toward shared goals, rather than having complete freedom to pursue individual interests. This focused methodology directly contributed to breakthrough achievements like the O1 reasoning model, demonstrating that in cutting-edge research, depth often trumps breadth.

Internal Transparency Accelerates Innovation

One of the most surprising revelations about OpenAI's culture is their radical internal transparency policy. (27:30) Jerry states that within their research organization of nearly 600 people, "everyone knows everything, really." This stands in stark contrast to typical tech company practices of compartmentalizing information. While this approach carries IP risks, Jerry argues that "the risk of not doing the right thing and of people not being informed about research and not being able to do the best research is much higher." This transparency enables researchers to make better decisions with complete information, fostering collaboration and preventing duplicate work. The policy reflects OpenAI's belief that they're "together in the scholar's larger than every one of us" and that shared fate creates stronger collective outcomes than protective secrecy.

Reasoning Models Require Extended Thinking Time

The fundamental difference between traditional language models and reasoning models lies in computational time allocation. (05:35) Jerry explains that reasoning involves "getting to an answer that you don't yet know" through a process that takes longer than typical question-answering. Models can now think for 30 minutes, an hour, or even two hours on complex tasks, (57:05) representing a dramatic shift from instant responses to deliberate problem-solving. This extended thinking time enables models to tackle significantly more complex problems, from coding projects that take several minutes to research tasks requiring hours of analysis. However, this creates a user experience challenge - balancing quality improvements from longer thinking against user expectations for quick responses. The technology demonstrates that like humans, AI systems can achieve better results by spending more time on difficult problems.

Reinforcement Learning Requires Foundational Pre-training

The relationship between pre-training and reinforcement learning represents a crucial insight into modern AI development. (33:32) Jerry emphasizes that "reinforcement learning would not work without pre-training" and conversely, "pre-trained models have a lot of limitations that are very hard to resolve without doing something that looks like reinforcement learning." This symbiotic relationship challenges views that position these approaches as competing paradigms. Pre-training provides the foundational knowledge and capabilities that RL can then shape and direct toward specific behaviors and goals. Jerry uses the analogy of training a dog - you need treats (rewards) to guide behavior, but the dog must first understand basic concepts before training can be effective. This insight suggests that successful AGI development requires both comprehensive world knowledge (pre-training) and behavioral optimization (RL) working in concert.

RL Scales Complexity Beyond Pre-training

The technical complexity of scaling reinforcement learning far exceeds that of pre-training, presenting unique challenges for AI development. (53:35) Jerry describes RL as "much, much more complex" than pre-training, comparing it to the difference between making steel blocks (standardized, uniform process) versus manufacturing semiconductors (requiring extreme precision with many potential failure points). This complexity stems from the numerous moving parts in RL systems - environments, reward functions, agent behaviors, and feedback loops that must all work harmoniously. Unlike pre-training's relatively straightforward next-token prediction, RL must navigate reward hacking, training instability, and complex optimization dynamics. This complexity explains why fewer organizations have successfully implemented large-scale RL systems and why OpenAI's achievements in this area represent significant technical breakthroughs. Understanding this complexity is crucial for anyone working on or evaluating AI systems.

Statistics & Facts

OpenAI's research organization consists of nearly 600 people, with Jerry noting that "everyone knows everything" within this group, demonstrating their radical internal transparency approach. (27:31)
Modern reasoning models can now think for extended periods - 30 minutes, an hour, or even two hours on certain types of tasks and problems, representing a dramatic evolution from instant response systems. (57:05)
OpenAI focuses on only three or four core research projects total, deliberately avoiding a broad portfolio approach to concentrate maximum resources and talent on breakthrough-level research. (24:55)