Command Palette

Search for a command to run...

PodMine
Latent Space: The AI Engineer Podcast
Latent Space: The AI Engineer Podcast•October 16, 2025

Why Fine-Tuning Lost and RL Won

A deep dive into the evolution of OpenPipe from fine-tuning to reinforcement learning, culminating in its acquisition by CoreWeave, exploring challenges in AI model training, reward functions, and the future of continual learning for AI agents.
AI & Machine Learning
Indie Hackers & SaaS Builders
Developer Culture
Sam Altman
Kyle Corbitt
Lucas Chu
Sean Kim
OpenAI

Summary Sections

  • Podcast Summary
  • Speakers
  • Key Takeaways
  • Statistics & Facts
  • Compelling StoriesPremium
  • Thought-Provoking QuotesPremium
  • Strategies & FrameworksPremium
  • Similar StrategiesPlus
  • Additional ContextPremium
  • Key Takeaways TablePlus
  • Critical AnalysisPlus
  • Books & Articles MentionedPlus
  • Products, Tools & Software MentionedPlus
0:00/0:00

Timestamps are as accurate as they can be but may be slightly off. We encourage you to listen to the full context.

0:00/0:00

Podcast Summary

This episode features Kyle Corbitt, cofounder and CEO of OpenPipe, who recently led his company through an acquisition by CoreWeave after building a successful fine-tuning and reinforcement learning platform over two years. Kyle, a former director of Y Combinator's Startup School, discusses the evolution from traditional fine-tuning to reinforcement learning for AI agents, sharing insights from OpenPipe's journey from initial concept to exit.

  • Main themes include the transition from supervised fine-tuning (SFT) to reinforcement learning (RL) in AI model training, the challenges of building production-ready AI agents, and the strategic considerations behind selling a growing startup.

Speakers

Kyle Corbitt

Kyle Corbitt is the cofounder and CEO of OpenPipe, which was recently acquired by CoreWeave. He previously served as Director of Startup School at Y Combinator for four and a half years, where he led external-facing initiatives including content creation, cofounder matching services, and technical infrastructure. Kyle has spoken three times at AI Engineer conferences and is recognized for his expertise in fine-tuning and reinforcement learning for production AI systems.

Adacio

Adacio is the founder of Kernel Labs and host of the Laid in Space podcast.

Swiggs

Swiggs is the editor of Laid in Space and co-host of the podcast.

Key Takeaways

Fine-tuning is Only Valuable When Forced to Smaller Models

Kyle emphasizes that fine-tuning should primarily be considered when you're forced to use smaller models due to cost, latency, or deployment constraints. (12:12) The main driver he sees today is real-time voice applications that require smaller models for latency reasons. For 90% of use cases where you aren't forced to a smaller model, fine-tuning still isn't a good ROI and you probably shouldn't invest in it. The key insight is that if you have flexibility in model choice, the frontier models will likely serve you better than a fine-tuned smaller model.

The Environment Problem is the Biggest Bottleneck in RL Training

One of the most significant challenges in implementing reinforcement learning for agents is creating robust, reproducible environments for training. Kyle explains that building a sandbox that reacts the same way your production system does is extremely difficult. (24:05) You need to simulate not just the system behavior but also user interactions and failure modes. This infrastructure challenge often prevents companies from successfully implementing RL, even when they understand the theoretical benefits.

LLM-as-Judge Solves the Reward Problem for RL

Kyle's team developed Ruler (Relatively Universal LLM Elicited Rewards), which uses LLMs to judge agent performance in a relative ranking format rather than absolute scoring. (52:07) This approach works phenomenally well because it leverages the insight from GRPO that you only need relative comparisons, not global truth. Even using weaker models like Qwen 2.5 32B as judges, they achieved state-of-the-art results. This breakthrough essentially solves the reward assignment problem that has been a major barrier to RL adoption.

Acquisition Strategy Should Focus on Strategic Fit Over Financial Terms

Kyle's acquisition by CoreWeave was driven by strategic alignment rather than pure financial optimization. The Weights & Biases founding team, recently acquired by CoreWeave, identified OpenPipe as a natural fit for moving up the stack. (60:32) Kyle emphasizes that while the negotiation process was long and painful, the post-acquisition experience has been "way better than I could have imagined." The key lesson is that finding buyers who understand your vision and can provide the right environment for continued growth is more valuable than maximizing short-term financial returns.

Hold Your Problem Tight and Your Solution Loosely

Drawing from his Y Combinator experience, Kyle advocates for staying focused on the core problem while remaining flexible about implementation approaches. (66:36) OpenPipe demonstrated this by pivoting from pure fine-tuning to reinforcement learning as market conditions changed and model pricing evolved. This adaptability allowed them to maintain relevance even as their original value proposition (making expensive GPT-4 more affordable through distillation) became less compelling due to dropping frontier model prices.

Statistics & Facts

  1. OpenPipe reached $1 million in ARR over an 8-month period following their August 2023 launch, demonstrating strong initial market validation for fine-tuning services. (05:01)
  2. Kyle estimates there's a 55-60% chance that everyone deploying agents at scale should be doing reinforcement learning, either as part of pre-deployment or continuously during deployment. (18:33)
  3. Anthropic's gross margins are reportedly around 6%, highlighting the challenging economics of frontier AI model providers despite massive revenue growth from $1B to $5B. (46:36)

Compelling Stories

Available with a Premium subscription

Thought-Provoking Quotes

Available with a Premium subscription

Strategies & Frameworks

Available with a Premium subscription

Similar Strategies

Available with a Plus subscription

Additional Context

Available with a Premium subscription

Key Takeaways Table

Available with a Plus subscription

Critical Analysis

Available with a Plus subscription

Books & Articles Mentioned

Available with a Plus subscription

Products, Tools & Software Mentioned

Available with a Plus subscription

More episodes like this

The Prof G Pod with Scott Galloway
January 14, 2026

Raging Moderates: Is This a Turning Point for America? (ft. Sarah Longwell)

The Prof G Pod with Scott Galloway
Young and Profiting with Hala Taha (Entrepreneurship, Sales, Marketing)
January 14, 2026

The Productivity Framework That Eliminates Burnout and Maximizes Output | Productivity | Presented by Working Genius

Young and Profiting with Hala Taha (Entrepreneurship, Sales, Marketing)
On Purpose with Jay Shetty
January 14, 2026

MEL ROBBINS: How to Stop People-Pleasing Without Feeling Guilty (Follow THIS Simple Rule to Set Boundaries and Stop Putting Yourself Last!)

On Purpose with Jay Shetty
Tetragrammaton with Rick Rubin
January 14, 2026

Joseph Nguyen

Tetragrammaton with Rick Rubin
Swipe to navigate