Search for a command to run...

Timestamps are as accurate as they can be but may be slightly off. We encourage you to listen to the full context.
This episode features Kyle Corbitt, cofounder and CEO of OpenPipe, who recently led his company through an acquisition by CoreWeave after building a successful fine-tuning and reinforcement learning platform over two years. Kyle, a former director of Y Combinator's Startup School, discusses the evolution from traditional fine-tuning to reinforcement learning for AI agents, sharing insights from OpenPipe's journey from initial concept to exit.
Kyle Corbitt is the cofounder and CEO of OpenPipe, which was recently acquired by CoreWeave. He previously served as Director of Startup School at Y Combinator for four and a half years, where he led external-facing initiatives including content creation, cofounder matching services, and technical infrastructure. Kyle has spoken three times at AI Engineer conferences and is recognized for his expertise in fine-tuning and reinforcement learning for production AI systems.
Adacio is the founder of Kernel Labs and host of the Laid in Space podcast.
Swiggs is the editor of Laid in Space and co-host of the podcast.
Kyle emphasizes that fine-tuning should primarily be considered when you're forced to use smaller models due to cost, latency, or deployment constraints. (12:12) The main driver he sees today is real-time voice applications that require smaller models for latency reasons. For 90% of use cases where you aren't forced to a smaller model, fine-tuning still isn't a good ROI and you probably shouldn't invest in it. The key insight is that if you have flexibility in model choice, the frontier models will likely serve you better than a fine-tuned smaller model.
One of the most significant challenges in implementing reinforcement learning for agents is creating robust, reproducible environments for training. Kyle explains that building a sandbox that reacts the same way your production system does is extremely difficult. (24:05) You need to simulate not just the system behavior but also user interactions and failure modes. This infrastructure challenge often prevents companies from successfully implementing RL, even when they understand the theoretical benefits.
Kyle's team developed Ruler (Relatively Universal LLM Elicited Rewards), which uses LLMs to judge agent performance in a relative ranking format rather than absolute scoring. (52:07) This approach works phenomenally well because it leverages the insight from GRPO that you only need relative comparisons, not global truth. Even using weaker models like Qwen 2.5 32B as judges, they achieved state-of-the-art results. This breakthrough essentially solves the reward assignment problem that has been a major barrier to RL adoption.
Kyle's acquisition by CoreWeave was driven by strategic alignment rather than pure financial optimization. The Weights & Biases founding team, recently acquired by CoreWeave, identified OpenPipe as a natural fit for moving up the stack. (60:32) Kyle emphasizes that while the negotiation process was long and painful, the post-acquisition experience has been "way better than I could have imagined." The key lesson is that finding buyers who understand your vision and can provide the right environment for continued growth is more valuable than maximizing short-term financial returns.
Drawing from his Y Combinator experience, Kyle advocates for staying focused on the core problem while remaining flexible about implementation approaches. (66:36) OpenPipe demonstrated this by pivoting from pure fine-tuning to reinforcement learning as market conditions changed and model pricing evolved. This adaptability allowed them to maintain relevance even as their original value proposition (making expensive GPT-4 more affordable through distillation) became less compelling due to dropping frontier model prices.