No Priors: Artificial Intelligence | Technology | Startups•October 9, 2025

Humans&: Bridging IQ and EQ in Machine Learning with Eric Zelikman

Eric Zelikman discusses bridging IQ and EQ in machine learning, highlighting the importance of developing AI models that understand human goals, collaborate effectively, and empower people rather than simply replacing them.

AI & Machine Learning

0:00/0:00

Timestamps are as accurate as they can be but may be slightly off. We encourage you to listen to the full context.

0:00/0:00

Podcast Summary

In this episode of No Priors, hosts interview Eric Zelkman, previously of Stanford and x AI, who has made significant contributions to reasoning and scaling up reinforcement learning. Eric discusses his groundbreaking research including STAR (Self-Taught Reasoner) and Quiet Star, which have become foundational to modern AI reasoning paradigms. (04:00) The conversation transitions to his new company, Human's End, which focuses on building AI models that better understand and collaborate with humans rather than simply replacing them. (22:00)

• Main Theme: The evolution from IQ-focused AI capabilities to EQ-focused human-AI collaboration, emphasizing the importance of building models that understand human goals and enable long-term partnerships rather than autonomous replacement.

Speakers

Eric Zelkman

Eric Zelkman is a renowned AI researcher who previously worked at Stanford University and x AI, where he made significant contributions to reasoning and reinforcement learning research. He is the creator of STAR (Self-Taught Reasoner) and Quiet Star, both of which have become widely adopted in the AI reasoning paradigm. At x AI, he worked on pre-training data for Grok two and the reasoning recipe for Grok three, as well as tool use and agentic infrastructure for Grok four. He is currently the founder of Human's End, a new company focused on building AI models that better understand and collaborate with humans.

Key Takeaways

Build AI Systems That Scale With Human Input

Eric emphasizes that the current trend toward fully autonomous AI systems may actually limit innovation potential. (16:15) Rather than removing humans from the loop entirely, the most effective AI systems should be designed to incorporate human feedback and collaboration as they scale. This approach not only maintains human agency but can actually achieve higher capability ceilings because it pushes AI capabilities into new, out-of-distribution areas where human insight is valuable. Organizations should actively decide to keep humans in the loop rather than defaulting to full automation.

Context Is King for Current AI Models

One of the most practical insights Eric shares is that current AI models are extremely sensitive to the amount and quality of context provided. (10:51) The more specific context you can give a model about your situation, constraints, and goals, the dramatically better its performance becomes. This is particularly important for business applications where providing comprehensive background information can mean the difference between a useful response and generic advice.

Memory and Long-Term Understanding Are Underinvested

Eric points out a fundamental limitation in current AI systems: they don't understand the long-term implications of their actions and responses. (26:57) Models treat every conversation turn as an independent game, leading to issues like sycophancy and lack of proactive behavior. This single-turn optimization prevents models from building genuine understanding of users over time, similar to having a friend who forgets everything about you between conversations. Companies should prioritize developing systems with genuine memory and long-term relationship capabilities.

Task-Centric Training Limits Real-World Impact

The field's obsession with single-task benchmarks is holding back AI's potential for deep integration into people's lives. (24:27) Eric argues that very few benchmarks actually consider how models affect people's lives over time or how they perform in multi-turn interactions with real users. This training paradigm produces models that are impressive on paper but fail to understand human goals and context in practical applications. Organizations should look beyond benchmark performance to evaluate how AI systems actually impact user outcomes.

Focus on Verifiable Tasks for Better AI Performance

When working with current AI models, Eric recommends focusing on tasks where answers can be easily verified or checked. (11:42) Models perform significantly better on problems with clear numerical answers or simple choices compared to open-ended tasks. If you can structure your AI applications around verifiable outcomes - whether in code, analysis, or decision-making - you'll see much more reliable performance from current systems.

Statistics & Facts

Eric mentions that some AI models can now handle tasks requiring hours of autonomous reasoning, with recent IMO (International Mathematical Olympiad) results showing models working for extended periods without human intervention. (17:12)
The METER benchmark measures progress by tracking how long AI models can work autonomously, with recent advances going from two-hour tasks to two-and-a-half-hour tasks without human intervention. (17:15)
Eric references that when testing n-digit arithmetic problems during STAR development, the number of digits models could handle kept increasing with more training iterations, showing no obvious plateau in scaling. (04:54)