Search for a command to run...

Timestamps are as accurate as they can be but may be slightly off. We encourage you to listen to the full context.
In this special episode, AI researcher and podcast host Matt Turck sits down with Nathan Lambert and Luca Soldaini from the Allen Institute for AI (AI2) to announce the release of the OLMO 3 model family - one of the most transparent open-source AI releases to date. Unlike typical "open weights" releases, AI2 is publishing everything: the models, training data (DOLMA 3), intermediate checkpoints, recipes, and detailed methodology. (01:30) The conversation provides an unusually transparent look into modern frontier AI development, covering the complete pipeline from pre-training through reinforcement learning.
• Main themes: The episode explores the technical architecture of reasoning models, the rise of Chinese open-source dominance led by models like Qwen and DeepSeek, America's emerging response through initiatives like ATOM, and the complex engineering reality behind modern AI training pipelines.
Nathan Lambert is a researcher at the Allen Institute for AI focusing on reinforcement learning from human feedback (RLHF) and post-training techniques. He previously worked at Hugging Face on open-source AI initiatives and holds a PhD from UC Berkeley in reinforcement learning. Lambert is also the author of the popular "Interconnects" newsletter and has been instrumental in developing techniques like Reinforcement Learning with Verifiable Rewards (RLVR).
Luca Soldaini is a research scientist at AI2 specializing in large language model pre-training and data curation. Originally from Italy, he holds a PhD in information retrieval and previously worked at Amazon on Alexa's search capabilities. At AI2, he leads the development of the DOLMA dataset series and has been instrumental in the OLMO model family development since the grassroots initiative began in 2022.
Matt Turck is Managing Director at FirstMark Capital and host of the MAD (Machine Learning, AI & Data) podcast. He writes extensively about the AI and data ecosystem and has been tracking the evolution of the AI landscape for over a decade.
AI2's OLMO 3 release demonstrates what true open source looks like beyond typical "open weights" releases. (11:10) While most companies release only final model weights, AI2 publishes intermediate checkpoints, training data, evaluation frameworks, and complete recipes. This level of transparency enables researchers to understand, modify, and build upon every aspect of the training process, addressing critical research questions around model behavior and enabling reproducible science.
The conversation reveals how Chinese labs like Qwen, DeepSeek, and Kimi have captured significant market share in open source AI. (16:37) Martin Casado's research shows that 80% of companies building with open models are using Chinese models like Qwen. This shift occurred partly due to Meta's leadership changes affecting Llama's future and different business model approaches - Chinese companies strategically use open releases to gain mindshare in Western markets where enterprises may be reluctant to pay for API services.
Pre-training demands extreme methodological rigor due to its computational expense and long duration. (47:03) Labs typically limit final training runs to two months maximum, requiring extensive preparation to avoid catastrophic failures. The process involves carefully curating the best possible data from massive pools (AI2 started with 300 trillion tokens and refined to 6 trillion), implementing architecture decisions that prevent training spikes, and maintaining scientific discipline throughout the months-long process.
Unlike the scientific rigor of pre-training, post-training involves significant technical artistry and complex infrastructure challenges. (70:51) Reinforcement learning on long-context reasoning models requires sophisticated systems orchestrating generation and training GPUs while handling quadratic memory scaling. Teams often discover that simple techniques like supervised fine-tuning on high-quality teacher models can yield dramatic improvements, while complex RL infrastructure may provide smaller gains but is essential for future capabilities.
Training competitive smaller models increasingly relies on distillation from larger, more capable teacher models rather than purely from scratch training. (57:37) AI2 used reasoning traces from DeepSeek R1 and Qwen's QwQ models to create 2.5 million reasoning examples for training their 7B and 32B models. This approach allows smaller models to achieve performance levels that would be difficult to reach through traditional scaling alone, effectively democratizing access to reasoning capabilities.