Search for a command to run...

Timestamps are as accurate as they can be but may be slightly off. We encourage you to listen to the full context.
In this enlightening episode, Julian Schrittwieser, a top AI researcher at Anthropic and former contributor to DeepMind's legendary AlphaGo Zero and MuZero projects, unpacks his viral blog post "Failing to Understand the Exponential, Again" with host Matt Turck. Julian explains why the AI bubble discussions seem disconnected from frontier lab realities, where consistent exponential progress continues unabated. (01:00) The conversation explores how task length is doubling every 3-4 months, suggesting AI agents capable of working autonomously for full days by 2026 and achieving expert-level breadth across multiple professions by 2027.
Julian is a leading AI researcher at Anthropic, previously at Google DeepMind where he was second author on AlphaGo Zero and lead author on MuZero. He contributed to some of the most groundbreaking AI projects in history, including AlphaGo, AlphaZero, AlphaCode, and AlphaTensor. Julian's work has fundamentally shaped our understanding of reinforcement learning and AI agents.
Matt is Managing Director at FirstMark, a leading venture capital firm focused on enterprise technology and AI investments. He hosts the MAD Podcast and writes extensively about the AI landscape and emerging technologies on his blog.
Julian emphasizes that the ability for AI models to work independently for extended periods is the key unlock for delegation and economic impact. (04:56) Current models can handle tasks lasting a few hours, but exponential growth suggests full-day autonomous work capability by 2026. This metric matters because it determines what you can actually delegate to AI - frequent human intervention limits practical utility, while agents that can work for hours enable true productivity multiplication across entire teams.
The combination of pre-training on vast human knowledge with reinforcement learning creates the most capable AI systems. (38:37) Pre-training provides an implicit world model similar to evolutionary encoding in animals, while RL teaches agents to correct their own errors and learn from their actual behavior distribution. This approach is more practical than training from scratch because pre-training brings immense value and creates agents with human-aligned values from the start.
High-quality training data is crucial for stable reinforcement learning, as demonstrated by AlphaZero's success. (47:07) AlphaZero spent significant computation on planning and search to generate exceptional training data, resulting in incredibly stable RL training that could run across continents. Modern language model RL is less stable because the difference between model capability and training data quality is smaller, suggesting that improving reasoning capabilities to generate higher-quality data is a key scaling direction.
Goodhart's Law applies heavily to AI benchmarks - any measure that becomes a target stops being a good measure. (54:29) Public benchmarks get gamed as teams optimize specifically for them, leading to misleading performance indicators. The solution is creating private, held-out evaluations that truly represent your use case. Companies should develop internal benchmarks based on their actual tasks rather than relying on public leaderboards for model selection.
Julian argues that AI will create complementary relationships rather than one-for-one job replacement, following the economic principle of comparative advantage. (63:57) AI excels at certain tasks while humans remain superior at others, leading to gradual productivity improvements rather than sudden displacement. This pattern mirrors chess and Go, where AI tools enhanced rather than eliminated human players, making the games more accessible and popular.