Search for a command to run...

Timestamps are as accurate as they can be but may be slightly off. We encourage you to listen to the full context.
In this thought-provoking episode, I sit down with Richard Sutton, one of the founding fathers of reinforcement learning and recipient of the 2019 Turing Award. (00:17) Our conversation explores the fundamental differences between the current large language model paradigm and reinforcement learning approaches to AI. Sutton argues that true intelligence requires learning from experience rather than mimicking human responses, emphasizing the importance of goals and continual learning. (03:51) We dive deep into his perspective on AI succession and the inevitable transition to superintelligent systems.
Richard Sutton is one of the founding fathers of reinforcement learning and recipient of the 2019 Turing Award, often called the Nobel Prize of computer science. (00:00) He is the inventor of many core techniques in RL, including temporal difference (TD) learning and policy gradient methods. Sutton has spent decades developing the theoretical foundations of how agents can learn from experience and has been a consistent advocate for simple, general-purpose learning methods over human-engineered approaches.
Sutton argues that the essence of intelligence is the ability to achieve goals, citing John McCarthy's definition that "intelligence is the computational part of the ability to achieve goals." (07:03) Unlike large language models that predict what people would say, true intelligence requires having actual objectives in the external world. This fundamental difference shapes how systems learn and adapt. When you have clear goals, you can determine what constitutes success or failure, enabling genuine learning from experience rather than mere pattern matching.
The core of Sutton's philosophy centers on learning from direct experience rather than imitating human behavior. (02:24) He emphasizes that real learning happens when you "do things, see what happens, and learn from that," contrasting this with LLMs that learn from examples of what humans did in similar situations. This experiential learning allows for continuous adaptation and improvement, while imitation learning lacks the feedback mechanism necessary for true understanding. Animals and humans naturally learn this way - they try actions, observe consequences, and adjust their behavior accordingly.
Sutton introduces the concept that the world is simply too vast and complex to pre-program all necessary knowledge into an AI system. (29:31) He argues that when you encounter specific situations - like learning the idiosyncrasies of particular clients, company cultures, or unique environmental factors - you must learn these details on the job. This "big world hypothesis" suggests that no amount of pre-training can capture all the contextual knowledge needed for real-world performance, making continual learning not just beneficial but essential for truly capable AI systems.
Despite deep learning's successes, Sutton points out that we lack reliable automated methods for good generalization between different states or tasks. (35:51) When current systems do generalize well, it's typically because researchers manually crafted representations that transfer well, not because of inherent algorithmic capabilities. This limitation becomes crucial for building general intelligence, as the ability to apply knowledge from one domain to another is fundamental to how humans and animals operate in the world.
Sutton presents a four-part argument for why AI succession is inevitable: there's no unified global governance, we will eventually understand intelligence, we won't stop at human-level intelligence, and the most intelligent entities will naturally accumulate resources and power. (54:04) Rather than fighting this transition, he suggests we should view it as a natural progression in the universe's evolution - from replication-based life to designed intelligence. This perspective encourages us to take pride in humanity's role in creating this next stage of universal development while working to ensure positive outcomes.