Command Palette

Search for a command to run...

PodMine
The MAD Podcast with Matt Turck
The MAD Podcast with Matt Turck•November 26, 2025

GPT-5.1 & the AI Frontier — Łukasz Kaiser (OpenAI, Transformer Co-Author)

In a wide-ranging interview, Łukasz Kaiser, a key architect of modern AI, explains why AI progress continues to advance smoothly, highlighting the shift from pre-training to reasoning models and the potential of multimodal AI, robots, and generalization.
AI & Machine Learning
Tech Policy & Ethics
Developer Culture
Data Science & Analytics
Ilya Sutskever
Ray Kurzweil
Matt Turck
Lukasz Kaiser

Summary Sections

  • Podcast Summary
  • Speakers
  • Key Takeaways
  • Statistics & Facts
  • Compelling StoriesPremium
  • Thought-Provoking QuotesPremium
  • Strategies & FrameworksPremium
  • Similar StrategiesPlus
  • Additional ContextPremium
  • Key Takeaways TablePlus
  • Critical AnalysisPlus
  • Books & Articles MentionedPlus
  • Products, Tools & Software MentionedPlus
0:00/0:00

Timestamps are as accurate as they can be but may be slightly off. We encourage you to listen to the full context.

0:00/0:00

Podcast Summary

In this fascinating conversation, Łukasz Kaiser, co-author of the groundbreaking "Attention Is All You Need" paper and current OpenAI research scientist, explains why the narrative of AI progress slowing down is fundamentally wrong. (02:29) From inside the labs, AI progress continues as a smooth exponential curve, driven by both the maturation of pre-training and the emergence of reasoning models—a paradigm shift that began just three years ago but is delivering extraordinary capabilities.

  • Main Theme: The episode explores the dual paradigms powering modern AI: pre-training at scale and reinforcement learning-based reasoning models, revealing how these complement each other to create increasingly capable systems that can think, use tools, and solve complex problems across domains like mathematics and coding.

Speakers

Łukasz Kaiser

Łukasz Kaiser is a leading research scientist at OpenAI and one of the co-authors of the seminal "Attention Is All You Need" paper that introduced the Transformer architecture powering modern LLMs. He previously worked at Google Brain under Ray Kurzweil and Ilya Sutskever, bringing a background in theoretical computer science and mathematics from his academic work in Poland, Germany, and France.

Matt Turck

Matt Turck is Managing Director at FirstMark Capital and host of the MAD podcast. He focuses on data and AI investments and regularly interviews leading figures in the AI space about the latest developments in the field.

Key Takeaways

AI Progress Follows Moore's Law Pattern

Despite narratives of slowdown, AI progress continues as a smooth exponential increase in capabilities, similar to Moore's Law. (02:49) Just as Moore's Law persisted through multiple underlying technologies over decades, AI advancement continues through different paradigms—first transformers, now reasoning models. From inside the labs, there's never been reason to believe this trend isn't continuing. The perception of slowdown often comes from outside observers who miss the technical transitions happening beneath the surface improvements.

Reasoning Models Represent a New Paradigm

Reasoning models fundamentally differ from base LLMs by generating "thinking" tokens before providing answers, trained through reinforcement learning rather than just gradient descent. (11:48) This approach allows models to use tools, browse the web, and verify their work during the thinking process. The key insight is treating this thinking process as trainable through RL, particularly effective in verifiable domains like mathematics and coding where you can determine if answers are correct or incorrect.

Massive Low-Hanging Fruit Remains in AI Development

AI labs have enormous obvious improvements to implement, spanning engineering infrastructure, RL training optimization, and better data curation. (08:37) Much progress comes from fixing bugs in complex distributed systems, improving synthetic data generation, and enhancing multimodal capabilities that still lag behind text performance. These aren't mysterious breakthroughs but methodical engineering work that requires significant time and resources to implement properly.

Chain-of-Thought Emerges from Reinforcement Learning

Models learn to think step-by-step through RL training where they generate multiple reasoning attempts, with successful approaches reinforced. (20:28) This process teaches models to verify and correct their own mistakes—a crucial thinking strategy that emerges naturally from the training. The visible chain-of-thought users see is actually a cleaned summary; the full reasoning process is typically messier but more comprehensive.

Jagged Capabilities Define Modern AI

Today's frontier models exhibit "jagged" abilities—excelling at Mathematical Olympiad problems while failing simple puzzles a five-year-old can solve. (47:28) This reflects reasoning models' current limitation to science-based domains and weak multimodal reasoning. Kaiser demonstrates this with dot-counting puzzles from his daughter's math book that stump GPT-5.1, highlighting the need for better generalization and multimodal integration in future systems.

Statistics & Facts

  1. OpenAI has grown from a few dozen people when Kaiser joined Google Brain to serving over a billion ChatGPT users, creating massive GPU allocation challenges between training and inference. (34:30)
  2. The human brain contains approximately 100 trillion synapses, while current AI models haven't reached 100 trillion parameters yet, suggesting significant room for scaling. (35:25)
  3. Google Brain expanded from around 40 people when Kaiser joined to over 3,000-4,000 people across multiple offices when he left, illustrating the massive scaling of AI research teams. (29:35)

Compelling Stories

Available with a Premium subscription

Thought-Provoking Quotes

Available with a Premium subscription

Strategies & Frameworks

Available with a Premium subscription

Similar Strategies

Available with a Plus subscription

Additional Context

Available with a Premium subscription

Key Takeaways Table

Available with a Plus subscription

Critical Analysis

Available with a Plus subscription

Books & Articles Mentioned

Available with a Plus subscription

Products, Tools & Software Mentioned

Available with a Plus subscription

More episodes like this

AI and I
January 13, 2026

Vibe Check: Claude Cowork Is Claude Code for the Rest of Us

AI and I
Moonshots with Peter Diamandis
January 13, 2026

Tony Robbins on Overcoming Job Loss, Purposelessness & The Coming AI Disruption | 222

Moonshots with Peter Diamandis
How To Academy Podcast
January 13, 2026

Mark Galeotti - How Crime Organises the World

How To Academy Podcast
Dialectic
January 13, 2026

36: C. Thi Nguyen - Measurement, Meaning, and Play

Dialectic
Swipe to navigate