Search for a command to run...

Timestamps are as accurate as they can be but may be slightly off. We encourage you to listen to the full context.
Nathan Labenz hosts an in-depth conversation with Owen McCabe (CEO) and Fergal Reed (Chief AI Officer) of Intercom, the creators of Fin, the AI customer service agent that has been a market leader since its launch two and a half years ago. (00:29) With over 400,000 businesses using their platform and the ability to measure resolution rate differences as small as a tenth of a percentage point, Fin represents one of the most intensively tested large language model applications in the market today.
Owen McCabe is the CEO of Intercom, leading the 1,100-person distributed organization. He studied computer science and specialized in AI and machine learning in 2004, giving him foundational knowledge that has proved valuable in the current AI era. McCabe is one of Intercom's co-founders and has been instrumental in pioneering the pay-per-outcome pricing model with Fin's famous $0.99 per resolution pricing.
Fergal Reed serves as Intercom's Chief AI Officer, leading a 50+ person AI team that includes scientists and researchers. He has been with Intercom for several years and was involved in creating the first brief for what became Resolution Bot, Fin's predecessor. Reed brings deep technical expertise to the role and oversees the constant experimentation and optimization that has driven Fin's consistent 1% monthly improvement in resolution rates.
Perhaps the most striking insight from this conversation is Fergal's assessment that GPT-4 level intelligence was already sufficient for the vast majority of customer service work. (01:44) While Fin's resolution rate has increased from 35% to 65% over two years, only a few percentage points of that improvement came from better models. The vast majority came from better context engineering - optimizing retrieval, reranking, prompting, and workflow design. This suggests that for many AI applications, the bottleneck isn't raw intelligence but rather the sophisticated orchestration layer around the models.
Intercom's experience reveals a critical truth about AI development: no matter how sophisticated your offline evaluations, the messiness of real human interaction means there's no substitute for large-scale A/B tests in production. (12:12) Fergal emphasizes their skepticism of evals, noting they've seen many promising technologies perform well in backtests but underperform in production. Their ability to detect resolution rate changes as small as 0.1% through massive A/B tests gives them a competitive advantage that smaller competitors simply cannot replicate.
Contrary to the Klarna-style narratives about massive job displacement, Intercom's data shows that AI is primarily serving previously unmet demand. (27:26) Owen notes that most support teams are "underwater" by about 30% versus their desired capacity. When Fin resolves 30-50% of queries, it typically brings teams from underwater to parity rather than enabling layoffs. The exception is Business Process Outsourcing (BPO) arrangements, where companies do frequently eliminate outsourced tier-one support after deploying Fin.
Intercom has begun training custom models for specific components of their AI stack, including a custom re-ranker that outperforms Cohere's top models. (40:16) For tasks where model intelligence is sufficient to saturate performance, they're replacing third-party LLMs with smaller, custom-trained models that offer better cost, latency, predictability, and reliability. This hybrid approach uses frontier models for the hardest reasoning tasks while deploying custom models for more routine but critical functions like query summarization.
Intercom's pioneering $0.99 per resolution pricing model, while initially unprofitable, has created strong alignment between the company and its customers. (70:52) Owen explains that charging per resolution means every improvement in Fin's effectiveness makes both customers and Intercom's CFO happier. This contrasts with per-conversation or per-token pricing models that could actually incentivize the vendor to maintain lower resolution rates. The model has since become profitable due to improved success rates and lower inference costs.