Search for a command to run...

Timestamps are as accurate as they can be but may be slightly off. We encourage you to listen to the full context.
Former OpenAI researcher Ashwin discusses his transition from robotics to language models, his work on the o1 reasoning team, and his move to Cursor. The episode explores the development of reinforcement learning for language models, the scaling paradigm, and the future of AI-assisted programming. (01:00)
Former OpenAI researcher who worked on the o1 reasoning team and recently joined Cursor. Previously completed a PhD in reinforcement learning with a focus on robotics at Berkeley under Sergey Levine, and was also an intern at OpenAI in 2017 working on robotics projects. At OpenAI, he specialized in hyperparameter scaling research and contributed to the development of reasoning models.
Ashwin explains that RL applied to language models is "kind of a weird, funny tool" that can completely dominate within its training distribution but struggles to generalize beyond it. (12:28) This insight suggests that to make RL effective for economically useful tasks, we need to bring the world of useful work into the training distribution rather than expecting the models to generalize to completely new domains. The practical implication is that AI products should capture the full context of what users want to accomplish.
A key insight from Ashwin's transition to Cursor is the importance of co-designing the product and the AI model together. (32:38) At large organizations like OpenAI, product teams and ML teams often operate separately, making rapid iteration difficult. At Cursor, they can implement policy updates every two hours because the product and ML teams sit next to each other. This enables much faster feedback loops and more targeted improvements.
Ashwin reveals that while external observers see AI progress as dramatic leaps, internally at OpenAI it felt very smooth - just a series of experiments where some work and get stacked together with continuous scaling. (25:56) This highlights how media narratives about sudden breakthroughs don't capture the reality of incremental scientific progress, and suggests that AI development is more predictable than it appears from the outside.
The transition from robotics to language models makes sense because robotics builds "very gritty people who look at a lot of data." (01:40) Robotics researchers are forced to be grounded in reality since they work with the physical world, making them excellent at understanding and debugging complex systems. This practical, data-focused mindset translates well to the empirical nature of training large language models.
Ashwin reflects that the RL research community from 2015-2022 may have "overfit to benchmarks pretty heavily" by introducing new algorithmic components that gave researchers implicit knobs to tune for better benchmark performance. (09:54) This led to methods that looked promising in academic settings but didn't translate to real-world applications, contributing to what some called the "RL winter." The lesson is to be wary of complex methods that work well on benchmarks but lack simplicity and generalization.