Command Palette

Search for a command to run...

PodMine
a16z Podcast
a16z Podcast•December 15, 2025

Dwarkesh and Ilya Sutskever on What Comes After Scaling

Dwarkesh interviews Ilya Sutskever about the challenges of scaling AI, exploring why current models perform well on benchmarks but struggle with real-world generalization, and discussing potential paths to developing safe and beneficial superintelligent AI.
AI & Machine Learning
Tech Policy & Ethics
Developer Culture
Ilya Sutskever
Dwarkesh Patel
OpenAI
Meta
Safe Superintelligence Inc. (SSI)

Summary Sections

  • Podcast Summary
  • Speakers
  • Key Takeaways
  • Statistics & Facts
  • Compelling StoriesPremium
  • Thought-Provoking QuotesPremium
  • Strategies & FrameworksPremium
  • Similar StrategiesPlus
  • Additional ContextPremium
  • Key Takeaways TablePlus
  • Critical AnalysisPlus
  • Books & Articles MentionedPlus
  • Products, Tools & Software MentionedPlus
0:00/0:00

Timestamps are as accurate as they can be but may be slightly off. We encourage you to listen to the full context.

0:00/0:00

Podcast Summary

In this thought-provoking episode from The Dwarkesh Podcast, host Dwarkesh interviews Ilya Sutskever, cofounder of SSI and former OpenAI chief scientist, about the puzzling disconnect between AI models' impressive benchmark performance and their underwhelming real-world impact. (02:24) Sutskever explores why current AI systems excel on evaluations yet struggle with basic reliability issues, like repeatedly introducing the same bugs when asked to fix code. The conversation delves into fundamental questions about generalization, learning efficiency, and what's actually blocking progress toward artificial general intelligence. (24:00) Sutskever argues we're transitioning from the "age of scaling" back to an "age of research," where simply throwing more compute at problems won't solve the core challenges of building reliable, human-level learning systems.

  • The episode explores the paradox of AI models that appear superintelligent on benchmarks but fail at basic real-world tasks, examining why pretraining and RL scaling work so differently and what this reveals about the path to AGI.

Speakers

Dwarkesh Patel

Host of The Dwarkesh Podcast, known for conducting in-depth conversations with leading figures in AI, technology, and science. He has interviewed prominent researchers and industry leaders, building a reputation for thoughtful, technical discussions that explore the cutting edge of artificial intelligence research.

Ilya Sutskever

Cofounder of Safe Superintelligence Inc. (SSI) and former Chief Scientist at OpenAI, where he played a pivotal role in developing GPT models. Previously a research scientist at Google Brain and co-author of foundational papers including AlexNet. Widely regarded as having exceptional research taste in AI, having contributed to many breakthrough developments in deep learning from convolutional networks to large language models.

Key Takeaways

Models Excel at Evaluations But Struggle in Practice Due to Training Focus

Sutskever identifies a fundamental disconnect between AI models' performance on benchmarks versus real-world applications. (02:54) He suggests this gap occurs because reinforcement learning training inadvertently takes inspiration from the evaluations themselves - researchers want their models to perform well on evals, so they design RL environments that mirror those tasks. However, if models have inadequate generalization capabilities, this creates systems that excel at specific benchmarks but fail at broader applications. This represents a form of "reward hacking" by human researchers who become too focused on eval performance rather than genuine capability development.

Human Learning Superiority Stems from Fundamental Generalization Differences

Unlike AI models that require massive amounts of data and specific training environments, humans demonstrate remarkable sample efficiency and robustness across domains. (27:00) Sutskever uses the analogy of two competitive programming students - one who practices 10,000 hours on specific problems versus another who practices 100 hours but has better foundational understanding. The second student, despite less practice, will likely perform better in their career. This "it factor" in human learning represents a fundamental advantage in generalization that current AI systems lack, even in domains like mathematics and coding where humans couldn't have evolutionary priors.

The Era of Scaling is Ending, Research Era is Beginning

Sutskever argues we're transitioning from 2020-2025's "age of scaling" back to an "age of research" similar to 2012-2020. (22:51) While scaling provided a reliable recipe for improvement (more data + compute + parameters = better results), we've now reached a point where simply scaling up may not yield transformative differences. The current landscape has "more companies than ideas" because scaling sucked all the air out of the room. Now that compute is abundant, the bottleneck has shifted from computational resources back to fundamental algorithmic insights and novel approaches to training.

Value Functions Could Unlock More Efficient Learning

Current reinforcement learning approaches suffer from extremely sparse reward signals - models must complete entire trajectories before receiving any learning signal. (15:01) Value functions could provide intermediate feedback, allowing models to learn from mistakes much earlier in the process. For example, if a model is pursuing an unproductive coding solution, a value function could provide negative feedback after 1,000 steps rather than waiting for the entire solution attempt to fail. This mirrors human learning, where we have intuitive senses of progress and can course-correct quickly. Implementing effective value functions could dramatically improve learning efficiency.

Emotions Function as Evolution's Built-in Value System

Sutskever discusses a fascinating case study of a person who lost emotional processing due to brain damage. (12:57) Despite maintaining intellectual capabilities, this individual became unable to make basic decisions, spending hours choosing socks and making poor financial choices. This suggests emotions serve as a critical value function that guides human decision-making and learning. The robustness and simplicity of emotional systems, despite their ancient evolutionary origins, demonstrates how effective value functions can remain useful across vastly different environments - a principle that could inform AI system design.

Statistics & Facts

  1. SSI raised $3 billion in funding, which Sutskever argues provides substantial research compute when accounting for how other companies must allocate resources to inference, engineering staff, and product development rather than pure research. (39:00)
  2. Public estimates suggest companies like OpenAI spend approximately $5-6 billion annually just on research experiments, separate from inference costs. (40:02)
  3. Historical perspective: AlexNet used only 2 GPUs, the original Transformer paper used 8-64 GPUs (equivalent to about 2 modern GPUs), demonstrating that breakthrough research doesn't always require maximal compute. (37:18)

Compelling Stories

Available with a Premium subscription

Thought-Provoking Quotes

Available with a Premium subscription

Strategies & Frameworks

Available with a Premium subscription

Similar Strategies

Available with a Plus subscription

Additional Context

Available with a Premium subscription

Key Takeaways Table

Available with a Plus subscription

Critical Analysis

Available with a Plus subscription

Books & Articles Mentioned

Available with a Plus subscription

Products, Tools & Software Mentioned

Available with a Plus subscription

More episodes like this

Dialectic
January 13, 2026

36: C. Thi Nguyen - Measurement, Meaning, and Play

Dialectic
Monetary Matters with Jack Farley
January 13, 2026

The Market’s Biggest Whales are Making Huge Changes: Total Portfolio Revolution | Steve Novakovic of CAIA

Monetary Matters with Jack Farley
Hard Fork
January 13, 2026

Can We Build a Better Social Network?

Hard Fork
The Art of Manliness
January 13, 2026

Money and Meaning — What Faith Traditions Teach Us About Personal Finance

The Art of Manliness
Swipe to navigate