Search for a command to run...

Timestamps are as accurate as they can be but may be slightly off. We encourage you to listen to the full context.
Sarah Catanzaro from Amplify Partners offers a veteran data investor's perspective on AI's evolution and the intersection between data infrastructure and artificial intelligence. (01:02) She discusses the DBT-Fivetran merger as a strategic IPO preparation move rather than the death of the modern data stack, revealing how major AI labs are adopting these same data tools for training data management and agent analytics. The conversation explores the concerning $100M+ seed funding phenomenon where companies raise massive rounds without clear six-month roadmaps, the overhyped but underspecified world models category, and her thesis that personalization through memory management and continual learning will be the key to solving AI application retention and churn problems in 2026.
• Main Theme: The symbiotic relationship between data infrastructure and AI, with a critical examination of current funding trends and emerging opportunities in personalization and memory management for AI applications.Sarah is a partner at Amplify Partners where she focuses on AI infrastructure and applications after previously investing through the modern data stack era with companies like DBT. She started her career in symbolic AI systems before transitioning to data infrastructure, driven by a desire to understand what happens when SQL queries are executed. Sarah has been at the intersection of data, compute, and intelligence for years, watching categories emerge, merge, and evolve from the analytics explosion to today's AI frontier.
The DBT-Fivetran merger signals strategic consolidation for IPO preparation rather than category failure. (01:02) Both companies were beating revenue targets and growing healthily, but needed to reach the new IPO threshold of $600M+ combined revenue. Major AI labs are actually heavy users of both DBT and Fivetran for training data curation and agent analytics, proving the tools remain relevant in the AI era. The merger represents the natural evolution of category winners positioning for liquidity rather than a retreat from market demand.
Frontier AI companies are paying careful attention to their data stacks, from data discoverability to efficient GPU data loading. (08:42) Sarah notes that training datasets need management, user interactions with agents require complex analytics, and GPU idle time from inefficient data loading creates significant cost implications. Surprisingly, much existing data infrastructure has scaled elegantly to meet AI use cases, though new challenges around ad hoc workloads and transactional database requirements (like OpenAI using RocksDB) are emerging.
The current funding environment features companies raising massive seed rounds ($100M+) at billion-dollar valuations without clear six-month roadmaps. (10:13) Founders are optimizing for signal and prestige rather than partnership or dilution discipline, creating seven-day decision windows that prevent proper due diligence. This dynamic makes it impossible to assess whether teams can execute on their long-term vision, as investors lack time to build conviction about founders' capabilities while founders focus on transactional relationships rather than strategic partnerships.
AI application companies suffer from low retention and high churn because products lack meaningful personalization. (19:01) The solution lies in memory management and continual learning systems that don't just store facts but learn new skills from user interactions and adapt as the world changes. This represents the consumerization of AI—moving beyond magical but static experiences to products that become more valuable over time through personalization, similar to how consumer enterprise tools disrupted traditional software a decade ago.
Despite labs paying 7-8 figures for synthetic RL environments, the best environment is the real world itself. (23:37) Real user logs, traces, and activity data (like Cursor uses) are richer, cheaper, and more generalizable than expensive synthetic clones. While some aspects of environment design remain relevant (rubrics, task definition), building app clones represents misallocated resources when authentic user behavior data provides superior training signals for improving AI systems.