Search for a command to run...

Timestamps are as accurate as they can be but may be slightly off. We encourage you to listen to the full context.
In this episode, Lenny interviews Chip Huyen, a core developer at NVIDIA's Nemo platform and author of the bestselling book "AI Engineering." The conversation dives deep into the technical foundations of building successful AI products, contrasting what people think improves AI apps versus what actually works. (04:35)
Key themes covered include:
Chip Huyen is a core developer on NVIDIA's Nemo platform, former AI researcher at Netflix, and has taught machine learning at Stanford University. She's a two-time founder and author of two widely-read books on AI, including "AI Engineering," which has been the most-read book on the O'Reilly platform since its launch. Unlike many AI commentators, Chip has built multiple successful AI products and platforms and works directly with enterprises on their AI strategies.
Host of Lenny's Newsletter and podcast, focused on helping ambitious professionals master product management, growth, and building successful companies. He brings deep experience in product strategy and has interviewed hundreds of successful entrepreneurs and operators.
Chip emphasized that successful AI apps improve through talking to users and understanding their needs rather than chasing the latest AI news or technologies. (05:35) She challenges the common practice of spending excessive time debating between similar technologies when the performance difference is minimal. Instead, she advocates for building reliable platforms, preparing better data, and writing better prompts. This approach delivers more tangible improvements than constantly switching between the newest frameworks or models.
In Retrieval Augmented Generation (RAG) systems, data preparation significantly outweighs the choice of vector database for quality improvements. (34:39) Chip explains that effective data preparation includes proper chunking strategies, adding contextual information like summaries and metadata, and even rewriting content in question-answer format. She shares examples of companies getting major performance gains by restructuring their documentation specifically for AI consumption, adding annotation layers that provide context AI models typically lack.
While pre-training establishes general model capabilities, post-training through techniques like supervised fine-tuning and reinforcement learning is where companies can differentiate their AI products. (14:05) Chip notes that frontier labs are focusing heavily on post-training because pre-training data is becoming limited and similar across companies. The real innovation happens in reinforcement learning with human feedback (RLHF), where domain experts provide examples and feedback to train models for specific use cases and behaviors.
Through analyzing productivity gains from AI coding tools, Chip discovered that senior, high-performing engineers typically see the largest productivity boosts from AI assistance. (46:06) She shares a fascinating study where a company divided their engineering team into three performance tiers and gave half of each group access to Cursor. The highest-performing engineers gained the most because they're proactive problem-solvers who can effectively leverage AI to solve problems better, while lower performers often lack the context to use these tools effectively.
AI evaluations (evals) are crucial for products operating at scale or where failures have catastrophic consequences, but they don't need to be implemented for every feature immediately. (22:38) Chip advocates for a pragmatic approach: build evals that help uncover opportunities where products are performing poorly, focusing on the most critical user paths. She suggests that the goal of evals is to guide product development by identifying specific segments or use cases that need improvement, rather than achieving perfect metrics across all features.