Search for a command to run...

Timestamps are as accurate as they can be but may be slightly off. We encourage you to listen to the full context.
In this special episode of the MAD podcast, host Matt Turk interviews Sholto Douglas, a leading AI researcher at Anthropic, following the release of Claude Sonnet 4.5. The conversation explores Sholto's journey from competitive fencing in Australia to becoming a key figure in AI research, highlighting his transition from Google's Gemini program to Anthropic. The discussion delves into the accelerating pace of AI progress, with models now capable of autonomous coding for up to 30 hours straight (23:01).
Sholto Douglas is a leading AI researcher at Anthropic who previously worked at Google on the Gemini program. He started at Google just a month before ChatGPT's release and played a crucial role in developing Gemini's inference stack, which saved hundreds of millions of dollars. Before his AI career, Sholto was a world-class fencer, reaching 43rd globally, and studied computer science robotics in Australia.
Matt Turk is from Firstmark and hosts the MAD podcast. He conducts in-depth interviews with leading figures in technology and AI research, focusing on making complex technical concepts accessible to a broader audience.
Douglas emphasizes that despite monthly claims of hitting plateaus, AI progress continues exponentially across all measurable domains. He observes that current AI training pipelines are primitive and held together by "duct tape and best efforts," indicating massive room for improvement (49:23). This suggests entrepreneurs and professionals should position themselves to capitalize on capabilities that will exist six months from now, not just current limitations.
The breakthrough allowing AI agents to work autonomously for 30 hours represents a fundamental shift from requiring supervision every 30 seconds to every 10-20 minutes. Douglas explains this enables building "working software rather than demos" - complete applications like functional Slack-like systems rather than simple prototypes (42:01). This extended operational capability opens entirely new categories of AI applications and business models.
Douglas explains that pretraining is like "skim reading every textbook" while RL is like "doing worked problems and getting feedback." Certain capabilities like learning to say "I don't know" can only emerge through RL, not pretraining (48:14). The combination of sufficient base model quality, adequate compute for RL, and simple approaches finally made RL work effectively with large language models in 2024.
Anthropic's focus on coding stems from it being uniquely tractable - you can verify when code works, run tests in parallel, and iterate rapidly without real-world consequences. Unlike self-driving cars that must work perfectly the first time, coding agents can fail 100 times as long as they succeed once (30:15). This makes coding the fastest path to both economic impact and advancing toward more general AI capabilities.
Douglas predicts that individuals will soon manage teams of AI agents working 24/7, dramatically amplifying personal productivity and impact. He currently uses two coding agents to double his work output and expects this to scale significantly (66:04). This increased leverage should be channeled toward solving humanity's major challenges in health, housing, poverty, and other critical areas where the world remains "imperfect in so many ways."