Search for a command to run...

Timestamps are as accurate as they can be but may be slightly off. We encourage you to listen to the full context.
This podcast episode features a deep technical discussion about the state of artificial intelligence in 2025 with Sebastian Raschka, author of "Build a Large Language Model From Scratch," and Nathan Lambert, post-training lead at the Allen Institute for AI and author of the RLHF book. The conversation covers the rapid evolution of AI models, particularly after the "DeepSeek moment" in January 2025, examining everything from architecture innovations to the competitive landscape between Chinese and American AI companies. (16:29)
Sebastian is a machine learning researcher and educator, best known for his influential technical books including "Build a Large Language Model From Scratch" and "Build a Reasoning Model From Scratch." He focuses on making complex AI concepts accessible through hands-on implementation, believing that building systems from scratch is the most effective way to truly understand how they work.
Nathan is the post-training lead at the Allen Institute for AI (Ai2) and author of "The RLHF Book," the definitive resource on reinforcement learning from human feedback. He's deeply involved in both research and policy work, including his Adam Project advocating for American open-weight AI models to compete with Chinese offerings.
The biggest breakthrough of 2025 has been reinforcement learning with verifiable rewards (RLVR), which allows models to learn through trial and error on problems with objectively correct answers like math and coding. (79:38) Unlike traditional RLHF which plateaus quickly, RLVR shows consistent scaling laws - you can keep investing compute and get better performance. This has enabled inference-time scaling where models can "think" for extended periods, dramatically improving their capabilities on complex tasks.
Chinese companies like DeepSeek, Kimi, and MiniMax have released increasingly powerful open-weight models that match or exceed closed American models in many domains. (21:20) Their strategy leverages open licensing to gain global influence while American companies can't monetize Chinese APIs due to security concerns. This has prompted Nathan's "Adam Project" advocating for increased US investment in open-weight models to maintain technological leadership.
Current AI coding tools like Claude Code and Cursor represent a fundamental shift in how software is created. The experience moves from micromanaging code details to thinking in design spaces and guiding systems at a macro level. (37:06) Professional developers are already shipping significant percentages of AI-generated code, with senior developers often using AI more extensively than juniors because they better understand how to leverage these tools effectively.
While transformer architectures remain dominant, meaningful innovations continue in attention mechanisms, mixture of experts (MoE) models, and efficiency improvements. (53:57) New approaches like text diffusion models and architectural tweaks for long context are being explored. The fundamental architecture hasn't changed dramatically since GPT-2, but optimizations in training, serving, and specialization continue to unlock new capabilities.
The AI industry has adopted an intense "996" work culture (9am-9pm, 6 days a week) driven by fierce competition and belief in imminent breakthroughs. (155:36) While this drives rapid progress, it comes with significant human costs including burnout and work-life balance issues. The authors suggest this culture may be unsustainable long-term, though it's currently fueled by the perception that we're living through a transformative moment in history.