Search for a command to run...

Timestamps are as accurate as they can be but may be slightly off. We encourage you to listen to the full context.
In this episode of No Priors, hosts interview Eric Zelkman, previously of Stanford and x AI, who has made significant contributions to reasoning and scaling up reinforcement learning. Eric discusses his groundbreaking research including STAR (Self-Taught Reasoner) and Quiet Star, which have become foundational to modern AI reasoning paradigms. (04:00) The conversation transitions to his new company, Human's End, which focuses on building AI models that better understand and collaborate with humans rather than simply replacing them. (22:00)
• Main Theme: The evolution from IQ-focused AI capabilities to EQ-focused human-AI collaboration, emphasizing the importance of building models that understand human goals and enable long-term partnerships rather than autonomous replacement.Eric Zelkman is a renowned AI researcher who previously worked at Stanford University and x AI, where he made significant contributions to reasoning and reinforcement learning research. He is the creator of STAR (Self-Taught Reasoner) and Quiet Star, both of which have become widely adopted in the AI reasoning paradigm. At x AI, he worked on pre-training data for Grok two and the reasoning recipe for Grok three, as well as tool use and agentic infrastructure for Grok four. He is currently the founder of Human's End, a new company focused on building AI models that better understand and collaborate with humans.
Eric emphasizes that the current trend toward fully autonomous AI systems may actually limit innovation potential. (16:15) Rather than removing humans from the loop entirely, the most effective AI systems should be designed to incorporate human feedback and collaboration as they scale. This approach not only maintains human agency but can actually achieve higher capability ceilings because it pushes AI capabilities into new, out-of-distribution areas where human insight is valuable. Organizations should actively decide to keep humans in the loop rather than defaulting to full automation.
One of the most practical insights Eric shares is that current AI models are extremely sensitive to the amount and quality of context provided. (10:51) The more specific context you can give a model about your situation, constraints, and goals, the dramatically better its performance becomes. This is particularly important for business applications where providing comprehensive background information can mean the difference between a useful response and generic advice.
Eric points out a fundamental limitation in current AI systems: they don't understand the long-term implications of their actions and responses. (26:57) Models treat every conversation turn as an independent game, leading to issues like sycophancy and lack of proactive behavior. This single-turn optimization prevents models from building genuine understanding of users over time, similar to having a friend who forgets everything about you between conversations. Companies should prioritize developing systems with genuine memory and long-term relationship capabilities.
The field's obsession with single-task benchmarks is holding back AI's potential for deep integration into people's lives. (24:27) Eric argues that very few benchmarks actually consider how models affect people's lives over time or how they perform in multi-turn interactions with real users. This training paradigm produces models that are impressive on paper but fail to understand human goals and context in practical applications. Organizations should look beyond benchmark performance to evaluate how AI systems actually impact user outcomes.
When working with current AI models, Eric recommends focusing on tasks where answers can be easily verified or checked. (11:42) Models perform significantly better on problems with clear numerical answers or simple choices compared to open-ended tasks. If you can structure your AI applications around verifiable outcomes - whether in code, analysis, or decision-making - you'll see much more reliable performance from current systems.