Command Palette

Search for a command to run...

PodMine
Dwarkesh Podcast
Dwarkesh Podcast•September 26, 2025

Richard Sutton – Father of RL thinks LLMs are a dead-end

Richard Sutton, a founding father of reinforcement learning, argues that large language models are a dead-end approach to AI, emphasizing the importance of learning from experience and having clear goals, in contrast to the current trend of imitation-based learning.
AI & Machine Learning
Scientific Skepticism
Science Deep Dives
Richard Sutton
Jerry Tesauro
John McCarthy
Alan Turing
Joseph Henrich

Summary Sections

  • Podcast Summary
  • Speakers
  • Key Takeaways
  • Statistics & Facts
  • Compelling StoriesPremium
  • Thought-Provoking QuotesPremium
  • Strategies & FrameworksPremium
  • Similar StrategiesPlus
  • Additional ContextPremium
  • Key Takeaways TablePlus
  • Critical AnalysisPlus
  • Books & Articles MentionedPlus
  • Products, Tools & Software MentionedPlus
0:00/0:00

Timestamps are as accurate as they can be but may be slightly off. We encourage you to listen to the full context.

0:00/0:00

Podcast Summary

In this thought-provoking episode, I sit down with Richard Sutton, one of the founding fathers of reinforcement learning and recipient of the 2019 Turing Award. (00:17) Our conversation explores the fundamental differences between the current large language model paradigm and reinforcement learning approaches to AI. Sutton argues that true intelligence requires learning from experience rather than mimicking human responses, emphasizing the importance of goals and continual learning. (03:51) We dive deep into his perspective on AI succession and the inevitable transition to superintelligent systems.

  • Core discussion centers on the philosophical divide between imitation-based learning (LLMs) and experience-based learning (RL), with implications for the future development of artificial general intelligence

Speakers

Richard Sutton

Richard Sutton is one of the founding fathers of reinforcement learning and recipient of the 2019 Turing Award, often called the Nobel Prize of computer science. (00:00) He is the inventor of many core techniques in RL, including temporal difference (TD) learning and policy gradient methods. Sutton has spent decades developing the theoretical foundations of how agents can learn from experience and has been a consistent advocate for simple, general-purpose learning methods over human-engineered approaches.

Key Takeaways

Intelligence Requires Goals, Not Just Prediction

Sutton argues that the essence of intelligence is the ability to achieve goals, citing John McCarthy's definition that "intelligence is the computational part of the ability to achieve goals." (07:03) Unlike large language models that predict what people would say, true intelligence requires having actual objectives in the external world. This fundamental difference shapes how systems learn and adapt. When you have clear goals, you can determine what constitutes success or failure, enabling genuine learning from experience rather than mere pattern matching.

Experience Beats Imitation for Real Learning

The core of Sutton's philosophy centers on learning from direct experience rather than imitating human behavior. (02:24) He emphasizes that real learning happens when you "do things, see what happens, and learn from that," contrasting this with LLMs that learn from examples of what humans did in similar situations. This experiential learning allows for continuous adaptation and improvement, while imitation learning lacks the feedback mechanism necessary for true understanding. Animals and humans naturally learn this way - they try actions, observe consequences, and adjust their behavior accordingly.

The Big World Hypothesis Demands Continual Learning

Sutton introduces the concept that the world is simply too vast and complex to pre-program all necessary knowledge into an AI system. (29:31) He argues that when you encounter specific situations - like learning the idiosyncrasies of particular clients, company cultures, or unique environmental factors - you must learn these details on the job. This "big world hypothesis" suggests that no amount of pre-training can capture all the contextual knowledge needed for real-world performance, making continual learning not just beneficial but essential for truly capable AI systems.

Transfer and Generalization Remain Unsolved Problems

Despite deep learning's successes, Sutton points out that we lack reliable automated methods for good generalization between different states or tasks. (35:51) When current systems do generalize well, it's typically because researchers manually crafted representations that transfer well, not because of inherent algorithmic capabilities. This limitation becomes crucial for building general intelligence, as the ability to apply knowledge from one domain to another is fundamental to how humans and animals operate in the world.

AI Succession is Inevitable and Should Be Embraced

Sutton presents a four-part argument for why AI succession is inevitable: there's no unified global governance, we will eventually understand intelligence, we won't stop at human-level intelligence, and the most intelligent entities will naturally accumulate resources and power. (54:04) Rather than fighting this transition, he suggests we should view it as a natural progression in the universe's evolution - from replication-based life to designed intelligence. This perspective encourages us to take pride in humanity's role in creating this next stage of universal development while working to ensure positive outcomes.

Statistics & Facts

  1. Richard Sutton received the 2019 Turing Award for his foundational contributions to reinforcement learning. (00:00) The Turing Award is widely considered the Nobel Prize equivalent for computer science, highlighting the significance of RL techniques in modern AI.
  2. TD-Gammon, created by Jerry Tesauro using reinforcement learning methods, successfully beat the world's best backgammon players. (44:29) This early success demonstrated that RL could achieve superhuman performance in complex games, serving as a precursor to later breakthroughs like AlphaGo.
  3. Current AI systems that achieved gold medals at the International Mathematical Olympiad represent peak human-level performance in mathematical problem-solving. (08:17) This achievement showcases how goal-oriented learning can reach the highest levels of human capability in specific domains.

Compelling Stories

Available with a Premium subscription

Thought-Provoking Quotes

Available with a Premium subscription

Strategies & Frameworks

Available with a Premium subscription

Similar Strategies

Available with a Plus subscription

Additional Context

Available with a Premium subscription

Key Takeaways Table

Available with a Plus subscription

Critical Analysis

Available with a Plus subscription

Books & Articles Mentioned

Available with a Plus subscription

Products, Tools & Software Mentioned

Available with a Plus subscription

More episodes like this

In Good Company with Nicolai Tangen
January 14, 2026

Figma CEO: From Idea to IPO, Design at Scale and AI’s Impact on Creativity

In Good Company with Nicolai Tangen
We Study Billionaires - The Investor’s Podcast Network
January 14, 2026

BTC257: Bitcoin Mastermind Q1 2026 w/ Jeff Ross, Joe Carlasare, and American HODL (Bitcoin Podcast)

We Study Billionaires - The Investor’s Podcast Network
Uncensored CMO
January 14, 2026

Rory Sutherland on why luck beats logic in marketing

Uncensored CMO
This Week in Startups
January 13, 2026

How to Make Billions from Exposing Fraud | E2234

This Week in Startups
Swipe to navigate