Search for a command to run...

Timestamps are as accurate as they can be but may be slightly off. We encourage you to listen to the full context.
In this deep-dive conversation with Owen McCabe (CEO) and Fergal Reed (Chief AI Officer) of Intercom, we explore how the company has built Fin, one of the most successful AI customer service agents in the market. (01:54) The discussion reveals that most customer service teams are currently underwater, meaning Fin is helping companies support more customers rather than replacing human agents outright. (30:48) Perhaps most surprisingly, Fergal explains that intelligence is no longer the limiting factor for customer service automation - GPT-4 was already intelligent enough for most tasks, and the majority of Fin's 30+ percentage point improvement in resolution rate has come from better context engineering rather than raw model improvements.
Owen McCabe is the CEO of Intercom, leading the 1,100-person distributed organization behind the AI customer service platform. He studied computer science and specialized in AI and machine learning in college (2004-2006), giving him technical foundation for understanding current developments. McCabe has been instrumental in pioneering outcome-based pricing in the AI space and building one of the most successful AI customer service platforms in the market.
Fergal Reed serves as Chief AI Officer at Intercom, leading a 50+ person AI group that includes scientists and researchers. He has been with the company for years and was involved in early AI product development, including the predecessor to Fin called Resolution Bot. Reed oversees the technical architecture and continuous optimization that has driven Fin's resolution rate from 35% at launch to the mid-60s today, with consistent 1% monthly improvements.
Fergal Reed revealed that GPT-4, available two years ago, was already intelligent enough for the vast majority of customer service work. (40:48) The 30+ percentage point improvement in Fin's resolution rate since launch has primarily come from better context engineering - including optimization of retrieval, reranking, prompting, and workflow design - rather than raw model intelligence improvements. This insight challenges the common focus on frontier model capabilities and suggests that many AI applications may be bottlenecked by implementation rather than intelligence.
Reed noted that most support teams are "underwater by about 30%" versus the capacity they wish they had. (30:00) When Fin resolves 30-50% of queries on day one, it typically takes teams from being underwater to roughly at parity, allowing human agents to move up the value chain rather than being replaced. The primary exception is BPOs (Business Process Outsourcing firms), where customers frequently deploy Fin and eliminate the outsourced tier-one support entirely.
While Intercom has sophisticated backtesting frameworks, Fergal emphasized that "the real world of humans and the real messiness of human communication is so messy that you can't build a perfect eval for it." (11:12) They run massively overpowered AB tests in production and can detect resolution rate changes as small as a tenth of a percentage point. This approach has proven essential because offline evaluations consistently fail to predict real-world performance in customer service scenarios.
Intercom's pioneering $0.99 per resolution pricing model creates beautiful alignment between the company and customers. (75:15) Owen explained that charging per query would remove their incentive to improve Fin's effectiveness, whereas charging per resolution means every 1% improvement in resolution rate makes both customers and Intercom's CFO happier. Despite initially being unprofitable at $1.21 cost per resolution, the model became profitable as success rates improved and inference costs decreased.
Intercom has begun training custom models for specific components of Fin, including a custom re-ranker that outperformed Cohere's top models. (40:16) For tasks like query summarization, they replaced GPT-3.5 Turbo with a combination of a proprietary encoder-decoder model and fine-tuned Qwen 3, achieving better cost, latency, predictability, and quality than third-party LLMs. This approach suggests that for production applications at scale, custom models may offer better trade-offs than general-purpose frontier models.