Decoder with Nilay Patel•October 2, 2025

The good, bad, and future of AI agents

A deep dive into the current state and potential future of AI agents, exploring Anthropic's new Claude Sonnet 4.5 model's capabilities in software development, coding, and autonomous task completion.

AI & Machine Learning

0:00/0:00

Timestamps are as accurate as they can be but may be slightly off. We encourage you to listen to the full context.

0:00/0:00

Podcast Summary

In this episode of Decoder, guest host Hayden Fields interviews David Hershey, who leads the applied AI team at Anthropic, about the company's latest breakthrough model Claude Sonnet 4.5. (02:17) The conversation explores how this new AI agent represents a significant leap forward in autonomous coding capabilities, with the ability to work continuously for up to 30 hours on complex software development tasks without human intervention. (02:58) David shares insights from testing the model, including how it recreated Anthropic's entire Claude.ai application from scratch overnight and built a sophisticated Slack-like chat application with 11,000 lines of code. The discussion covers the current state of AI agents across different industries, their limitations, and what this technology means for the future of work, particularly in software development.

Main Theme: The evolution of AI agents from prototype stage to production-ready tools, with a focus on autonomous coding capabilities and their impact across various industries.

Speakers

Hayden Fields

Senior AI reporter at The Verge and guest host for this episode of Decoder. Hayden specializes in covering the AI industry and frequently reports on the developments, challenges, and breakthroughs in artificial intelligence technology.

David Hershey

Leads the applied AI team at Anthropic, where he works directly with startups and enterprises to help them implement Anthropic's AI technology effectively. David's role involves extensive testing of new AI models to understand their capabilities and limitations, and he has a background as a trained software engineer, making him particularly qualified to assess AI coding capabilities.

Key Takeaways

AI Agents Excel in Coding But Struggle with Spatial Awareness

The current generation of AI agents, particularly Claude Sonnet 4.5, has achieved remarkable capabilities in software development while still failing at surprisingly simple tasks. (05:23) David explains that while the model can spend 30 hours autonomously building complex applications, it still struggles with basic spatial awareness like understanding left from right or navigating game boards. This paradox illustrates how AI development isn't linear – models can master PhD-level mathematics while failing to understand basic spatial relationships. (37:16) For professionals, this means understanding that AI agents are powerful tools for specific domains but require human oversight for tasks requiring spatial reasoning or complex visual interpretation.

Industry-Specific AI Adoption Follows Unexpected Patterns

Legal services emerged as one of the most surprising sectors for rapid AI agent adoption, despite being traditionally conservative about technology. (09:58) David observed that law firms are hiring large teams of lawyers specifically to help build AI products, creating a unique feedback loop where domain expertise directly informs AI development. This suggests that successful AI implementation requires deep collaboration between technologists and domain experts. The legal field's rapid adoption shows that industries with high volumes of information synthesis work are prime candidates for AI augmentation, regardless of their historical technology adoption patterns.

Autonomous Development Requires Pragmatic Task Decomposition

Claude Sonnet 4.5's success in long-term autonomous coding comes from its ability to break down complex projects into manageable chunks rather than attempting grand, ambitious implementations. (28:58) David noted that unlike previous models that would "meander everywhere trying to do miraculous work," this model approaches development pragmatically – testing image uploads, committing code to version control, and completing one feature at a time. (29:06) This mirrors effective human collaboration and suggests that the most successful AI agents will be those that work methodically rather than trying to solve everything at once. Professionals can apply this principle by structuring their own AI interactions with clear, sequential objectives.

The Interface Problem Limits AI Agent Accessibility

Despite AI models becoming increasingly capable, the tools and interfaces for interacting with them haven't kept pace, creating a barrier to widespread adoption. (48:32) David explains that while Claude Sonnet 4.5 is "the first model that could be that thing where anybody could build a production ready application," current interfaces like coding assistants and chat windows aren't sufficient for non-technical users. He predicts that breakthrough consumer adoption will require "one more interface" beyond current tools. (49:06) This insight suggests that businesses should focus not just on AI capabilities but on creating intuitive interfaces that make powerful AI accessible to their teams without technical expertise.

AI Agents Represent Collaborative Enhancement Rather Than Direct Replacement

The future of AI agents lies in augmenting human capabilities rather than replacing workers entirely, though this dynamic is evolving rapidly. (31:11) David describes current AI as "a collaborator" that "accelerates me" and "makes our whole team better and faster," while acknowledging that watching an AI work for 30 hours straight raises questions about future roles. (31:59) He believes there's "a ton of room to make us better, to make better software for users" through this technology. For professionals, this suggests focusing on developing skills that complement AI capabilities – strategic thinking, quality assessment, and creative problem-solving – while leveraging AI for execution and analysis.

Statistics & Facts

Claude Sonnet 4.5 can run autonomously for up to 30 hours straight without human intervention while working on complex coding tasks. (02:58) This represents a significant breakthrough in AI agent endurance and reliability.
The model successfully built a chat application similar to Slack or Teams with 11,000 lines of code in a 30-hour autonomous coding session. (25:32) This application included features like DMs, threads, channels, search functionality, image uploads, and multi-user authentication.
Based on LinkedIn data mentioned in the episode, 72% of small and medium businesses using LinkedIn say it helps them find high-quality candidates, though this was part of an advertisement rather than content from the interview.