Decoder with Nilay Patel•October 2, 2025

The good, bad, and future of AI agents

A deep dive into the current state and potential future of AI agents, focusing on Anthropic's new Claude Sonnet 4.5 model and its impressive ability to autonomously code complex applications for up to 30 hours.

AI & Machine Learning

Indie Hackers & SaaS Builders

0:00/0:00

Timestamps are as accurate as they can be but may be slightly off. We encourage you to listen to the full context.

0:00/0:00

Podcast Summary

In this episode of Decoder, guest host Hayden Fields interviews David Hershey, who leads the Applied AI team at Anthropic. The conversation centers around Anthropic's newest release, Claude Sonnet 4.5, which represents a significant breakthrough in autonomous AI agents, particularly for coding applications. (02:58) Hershey explains that this model can work autonomously for up to 30 hours straight on complex tasks like building software applications from scratch, marking a major step forward from previous AI capabilities.

The main theme focuses on the current state and future potential of AI agents, examining where these technologies excel today (primarily in coding) and where they still struggle (spatial awareness, UI navigation), while discussing the path toward more autonomous AI systems that could eventually handle complex, multi-day tasks across various industries.

Speakers

Hayden Fields

Hayden Fields is a senior AI reporter at The Verge and serves as the Thursday episode guest host for Decoder. She covers developments in the AI industry, focusing on the practical applications and limitations of generative AI technologies. Her reporting expertise includes analyzing AI company strategies and testing emerging AI tools.

David Hershey

David Hershey leads the Applied AI team at Anthropic, where he works directly with startups and enterprises to help them implement Anthropic's AI technology effectively. He spends significant time testing new AI models to understand their capabilities and limitations, and he's known for creating viral experiments like having Claude play Pokemon. His role bridges the gap between cutting-edge AI research and real-world business applications.

Key Takeaways

AI Agents Excel in Coding But Struggle with Spatial Tasks

The current generation of AI agents shows remarkable capability in software development tasks but falls short in areas requiring spatial awareness. (05:23) Hershey explains that while Claude Sonnet 4.5 can autonomously build complex applications over many hours, it still struggles with basic spatial concepts like understanding chess board positions or navigating game environments. This highlights how AI development isn't uniform - models can perform PhD-level mathematics while failing at seemingly simple spatial reasoning tasks that humans take for granted.

The Legal Industry Has Rapidly Embraced AI Agents

One of the most surprising sectors to quickly adopt AI technology has been the legal industry, traditionally known for being slow to embrace technological change. (09:58) Legal professionals have found AI particularly valuable for synthesizing large volumes of case law and legal documents. However, successful implementation requires having lawyers directly involved in the product development process, with companies hiring legal professionals as full-time staff to help build and refine AI-powered legal tools.

Pragmatic Task Decomposition Makes AI More Effective

Claude Sonnet 4.5 demonstrates a more mature approach to handling complex projects by breaking them into manageable chunks rather than attempting everything at once. (28:38) Instead of getting overly ambitious and meandering across multiple tasks simultaneously, the new model takes a pragmatic approach - focusing on one specific component at a time, testing it thoroughly, then moving to the next piece. This methodical approach mirrors how effective human collaborators work and makes the AI more reliable for long-term autonomous projects.

Interface Evolution Is Critical for AI Adoption

The success of AI coding tools depends heavily on interface design that evolves alongside model capabilities. (47:52) Hershey traces the evolution from simple code completion tools like Copilot to more sophisticated integrated development environments like Cursor. He suggests that while Sonnet 4.5 may be capable enough for anyone to build production-ready applications, we still need better interfaces that make this power accessible to non-technical users without requiring them to navigate complex setup processes.

AI Models Need Domain Expert Integration

Creating effective AI agents for specialized industries requires direct collaboration with experts from those fields, not just additional training data. (13:24) Rather than simply feeding more domain-specific data into models, Anthropic focuses on incorporating the intelligence and expertise of professionals like accountants, lawyers, and other specialists directly into the development process. This approach recognizes that building effective AI for specialized domains requires ongoing human expertise and feedback loops, not just larger datasets.

Statistics & Facts

Claude Sonnet 4.5 can work autonomously for up to 30 hours straight on coding tasks without human intervention. (02:58) This represents a significant leap in AI agent capabilities, allowing for multi-day autonomous work on complex software development projects.
The 30-hour coding project resulted in an 11,000-line chat application similar to Slack or Teams, complete with DMs, threads, channels, search functionality, image uploads, and multi-user authentication. (26:45) This demonstrates the complexity of applications that AI can now build independently.
Anthropic's headquarters building is filled primarily with software engineers, which Hershey notes as a key reason why their models excel at coding tasks. (14:40) This highlights how the composition of AI development teams directly influences model capabilities.