Latent Space: The AI Engineer Podcast•December 30, 2025

[State of Code RL] Cursor Composer, OpenAI o3/GPT-5, and Reasoning — Ashvin Nair, Cursor

An exploration of AI progress through the lens of reinforcement learning, discussing Ashvin Nair's journey from robotics to OpenAI and now Cursor, with insights into model development, continual learning, and the challenges of scaling AI technologies.

AI & Machine Learning

Indie Hackers & SaaS Builders

Tech Policy & Ethics

Developer Culture

Programming Interviews & Prep

Sam Altman

Lex Fridman

Ashwin

0:00/0:00

Timestamps are as accurate as they can be but may be slightly off. We encourage you to listen to the full context.

0:00/0:00

Podcast Summary

Former OpenAI researcher Ashwin discusses his transition from robotics to language models, his work on the o1 reasoning team, and his move to Cursor. The episode explores the development of reinforcement learning for language models, the scaling paradigm, and the future of AI-assisted programming. (01:00)

Key themes include the evolution from robotics to LLMs, the development of reasoning models at OpenAI, and the potential for co-designing products with AI models at Cursor

Speakers

Ashwin

Former OpenAI researcher who worked on the o1 reasoning team and recently joined Cursor. Previously completed a PhD in reinforcement learning with a focus on robotics at Berkeley under Sergey Levine, and was also an intern at OpenAI in 2017 working on robotics projects. At OpenAI, he specialized in hyperparameter scaling research and contributed to the development of reasoning models.

Key Takeaways

Reinforcement Learning Doesn't Generalize Well Beyond Training Distribution

Ashwin explains that RL applied to language models is "kind of a weird, funny tool" that can completely dominate within its training distribution but struggles to generalize beyond it. (12:28) This insight suggests that to make RL effective for economically useful tasks, we need to bring the world of useful work into the training distribution rather than expecting the models to generalize to completely new domains. The practical implication is that AI products should capture the full context of what users want to accomplish.

Co-Design Products and Models for Better Performance

A key insight from Ashwin's transition to Cursor is the importance of co-designing the product and the AI model together. (32:38) At large organizations like OpenAI, product teams and ML teams often operate separately, making rapid iteration difficult. At Cursor, they can implement policy updates every two hours because the product and ML teams sit next to each other. This enables much faster feedback loops and more targeted improvements.

Internal AI Progress Feels Smooth, External Perception is Discontinuous

Ashwin reveals that while external observers see AI progress as dramatic leaps, internally at OpenAI it felt very smooth - just a series of experiments where some work and get stacked together with continuous scaling. (25:56) This highlights how media narratives about sudden breakthroughs don't capture the reality of incremental scientific progress, and suggests that AI development is more predictable than it appears from the outside.

Robotics People Are Well-Suited for Language Model Work

The transition from robotics to language models makes sense because robotics builds "very gritty people who look at a lot of data." (01:40) Robotics researchers are forced to be grounded in reality since they work with the physical world, making them excellent at understanding and debugging complex systems. This practical, data-focused mindset translates well to the empirical nature of training large language models.

Academic RL Research Overfit to Benchmarks

Ashwin reflects that the RL research community from 2015-2022 may have "overfit to benchmarks pretty heavily" by introducing new algorithmic components that gave researchers implicit knobs to tune for better benchmark performance. (09:54) This led to methods that looked promising in academic settings but didn't translate to real-world applications, contributing to what some called the "RL winter." The lesson is to be wary of complex methods that work well on benchmarks but lack simplicity and generalization.

Statistics & Facts

The reasoning team at OpenAI grew to approximately 300 people by the time o3 was released, compared to the original o1 team which started with about a dozen people. (21:31)
OpenAI's internal lead time over external releases has shrunk from 6 months (when they had GPT-4 while releasing ChatGPT) to just 1-2 months currently due to competitive pressure. (27:00)
At a 2024 conference, experts predicted AI would reach 20% performance on math exams by 2027, but internal models at the time were already exceeding those estimates by approximately 2 years. (28:21)