Search for a command to run...

Timestamps are as accurate as they can be but may be slightly off. We encourage you to listen to the full context.
In this episode of Decoder, guest host Hayden Fields interviews David Hershey, who leads the Applied AI team at Anthropic. The conversation centers around Anthropic's newest release, Claude Sonnet 4.5, which represents a significant breakthrough in autonomous AI agents, particularly for coding applications. (02:58) Hershey explains that this model can work autonomously for up to 30 hours straight on complex tasks like building software applications from scratch, marking a major step forward from previous AI capabilities.
Hayden Fields is a senior AI reporter at The Verge and serves as the Thursday episode guest host for Decoder. She covers developments in the AI industry, focusing on the practical applications and limitations of generative AI technologies. Her reporting expertise includes analyzing AI company strategies and testing emerging AI tools.
David Hershey leads the Applied AI team at Anthropic, where he works directly with startups and enterprises to help them implement Anthropic's AI technology effectively. He spends significant time testing new AI models to understand their capabilities and limitations, and he's known for creating viral experiments like having Claude play Pokemon. His role bridges the gap between cutting-edge AI research and real-world business applications.
The current generation of AI agents shows remarkable capability in software development tasks but falls short in areas requiring spatial awareness. (05:23) Hershey explains that while Claude Sonnet 4.5 can autonomously build complex applications over many hours, it still struggles with basic spatial concepts like understanding chess board positions or navigating game environments. This highlights how AI development isn't uniform - models can perform PhD-level mathematics while failing at seemingly simple spatial reasoning tasks that humans take for granted.
One of the most surprising sectors to quickly adopt AI technology has been the legal industry, traditionally known for being slow to embrace technological change. (09:58) Legal professionals have found AI particularly valuable for synthesizing large volumes of case law and legal documents. However, successful implementation requires having lawyers directly involved in the product development process, with companies hiring legal professionals as full-time staff to help build and refine AI-powered legal tools.
Claude Sonnet 4.5 demonstrates a more mature approach to handling complex projects by breaking them into manageable chunks rather than attempting everything at once. (28:38) Instead of getting overly ambitious and meandering across multiple tasks simultaneously, the new model takes a pragmatic approach - focusing on one specific component at a time, testing it thoroughly, then moving to the next piece. This methodical approach mirrors how effective human collaborators work and makes the AI more reliable for long-term autonomous projects.
The success of AI coding tools depends heavily on interface design that evolves alongside model capabilities. (47:52) Hershey traces the evolution from simple code completion tools like Copilot to more sophisticated integrated development environments like Cursor. He suggests that while Sonnet 4.5 may be capable enough for anyone to build production-ready applications, we still need better interfaces that make this power accessible to non-technical users without requiring them to navigate complex setup processes.
Creating effective AI agents for specialized industries requires direct collaboration with experts from those fields, not just additional training data. (13:24) Rather than simply feeding more domain-specific data into models, Anthropic focuses on incorporating the intelligence and expertise of professionals like accountants, lawyers, and other specialists directly into the development process. This approach recognizes that building effective AI for specialized domains requires ongoing human expertise and feedback loops, not just larger datasets.