Command Palette

Search for a command to run...

PodMine
The MAD Podcast with Matt Turck
The MAD Podcast with Matt Turck•January 22, 2026

The End of GPU Scaling? Compute & The Agent Era — Tim Dettmers (Ai2) & Dan Fu (Together AI)

A deep dive into the future of AI, exploring computational constraints, the potential of AGI, the transformative power of agents, and predictions for technological progress in 2026, with Tim Dettmers and Dan Fu offering contrasting yet complementary perspectives on AI's trajectory.
AI & Machine Learning
Tech Policy & Ethics
Developer Culture
Hardware & Gadgets
Programming Interviews & Prep
Matt Turck
Tim Dettmers
Dan Fu

Summary Sections

  • Podcast Summary
  • Speakers
  • Key Takeaways
  • Statistics & Facts
  • Compelling StoriesPremium
  • Thought-Provoking QuotesPremium
  • Strategies & FrameworksPremium
  • Similar StrategiesPlus
  • Additional ContextPremium
  • Key Takeaways TablePlus
  • Critical AnalysisPlus
  • Books & Articles MentionedPlus
  • Products, Tools & Software MentionedPlus
0:00/0:00

Timestamps are as accurate as they can be but may be slightly off. We encourage you to listen to the full context.

0:00/0:00

Podcast Summary

In this episode of the MAD Podcast, host Matt Turck explores two opposing perspectives on AGI with Tim Dettmers from the Allen Institute for AI and Dan Fu from Together AI. (01:08) Tim argues in his provocative essay "Why AGI Will Not Happen" that we're hitting fundamental physical constraints and diminishing returns in computation, particularly citing memory movement bottlenecks and hardware limitations. (08:20) Meanwhile, Dan counters with his essay "Yes, AGI Will Happen," contending that current models are severely underutilizing available hardware and are lagging indicators of computational progress. (16:16) The conversation then shifts to practical applications, with both experts agreeing that AI agents have already reached a critical threshold for transforming software engineering and knowledge work. (32:12) They discuss the "software singularity" where coding agents can now tackle even the most complex programming challenges, and emphasize that professionals who don't adapt to using agents effectively will be left behind.

  • The debate centers on whether physical constraints will limit AI progress versus the argument that massive computational headroom remains untapped in current systems

Speakers

Tim Dettmers

Tim is an Assistant Professor at Carnegie Mellon University in the Machine Learning and Computer Science departments and a Research Scientist at the Allen Institute for AI. He's renowned for his pioneering work in efficient deep learning and quantization, including the development of QLoRA, a breakthrough method for efficient fine-tuning that uses up to 16 times less memory than traditional approaches. He previously worked three years in Germany's automation industry before transitioning to AI research.

Dan Fu

Dan is an Assistant Professor at UC San Diego and VP of Kernels at Together AI, where he focuses on making AI models run faster through specialized GPU programming. During his PhD, he developed FlashAttention, a crucial optimization for transformer models, and researched alternative architectures like state-space models. At Together AI, he leads efforts to maximize hardware utilization and recently collaborated with Cursor to accelerate their models for the launch of Composer 2.0 on NVIDIA's Blackwell GPUs.

Key Takeaways

Physical Constraints Create Real Limits on AI Progress

Tim argues that computational progress faces fundamental physical barriers, particularly the von Neumann bottleneck that limits how efficiently data can be moved between memory and processors. (13:01) He explains that useful computation requires two key components: gathering data from different locations and transforming it into new information. The geometric constraints of moving information from large, slow memory (DRAM) to fast processing units create unavoidable latency issues. Modern optimizations like stacked HBM memory and quantization to 4-bit precision have reached their practical limits, with manufacturing yields becoming prohibitively difficult and no new breakthrough technologies on the horizon to overcome these bottlenecks.

Current Models Drastically Underutilize Available Hardware

Dan presents compelling evidence that today's best models operate at only 20% hardware utilization (MFU - Model Flop Utilization), compared to 50-60% achieved in earlier 2020s training runs. (18:05) He points to DeepSeek's model, trained on 2,000 nerfed H800 GPUs for about a month in 2024, as an example of this massive underutilization. Since then, companies like Poolside and Reflection have built clusters with tens of thousands of next-generation B200 chips that are 2-3x faster, creating potential for up to 100x more available compute when combined with optimization improvements.

Models Are Lagging Indicators of Hardware Progress

The models we interact with today were trained on hardware clusters built 1.5-2 years ago, creating a significant gap between current computational capabilities and deployed AI systems. (22:01) This lag occurs because large pre-training runs require substantial time for cluster setup, training execution, and post-training refinement including RLHF. As Dan explains, even OpenAI's GPT-4.5 Turbo was only partially trained on newer hardware, with most pre-training occurring on older H100 clusters while newer GB200 chips were used primarily for fine-tuning phases.

Agents Have Crossed the Threshold for Complex Programming Tasks

Dan describes his pivotal moment in June 2025 when AI agents became capable of writing GPU kernels - traditionally considered the "final boss" of programming challenges. (33:21) These highly specialized, parallel programs written in C++ typically required expert-level skills and weeks of development time. With agent assistance, Dan's team accomplished what previously took months in single days, achieving 5-10x productivity improvements. This breakthrough suggests agents have reached a level where they can accelerate even the most technically demanding programming work when guided by domain experts.

Domain Expertise Becomes More Critical, Not Less, in an Agent-Driven World

Both experts emphasize that using agents effectively requires treating them like junior team members who need clear context, task decomposition, and expert oversight. (44:02) Dan compares agent management to onboarding new interns - you wouldn't ask them to double company revenue, but with proper guidance and tools, they can be highly productive. Tim adds that 90% of code and text should be written by agents, but the critical 10% of human review and editing makes the difference between mediocre and excellent output. The key insight: agents amplify existing expertise rather than replace the need for deep domain knowledge.

Statistics & Facts

  1. DeepSeek's model achieved only 20% hardware utilization (MFU) when trained on 2,000 H800 GPUs, compared to 50-60% utilization rates achieved in earlier 2020s training runs. (18:05)
  2. Tim's team developed QLoRA, which uses up to 16 times less memory than traditional fine-tuning approaches while maintaining performance. (02:05)
  3. Current inference workloads utilize less than 5% of available GPU hardware capacity, creating massive room for efficiency improvements. (55:20)

Compelling Stories

Available with a Premium subscription

Thought-Provoking Quotes

Available with a Premium subscription

Strategies & Frameworks

Available with a Premium subscription

Similar Strategies

Available with a Plus subscription

Additional Context

Available with a Premium subscription

Key Takeaways Table

Available with a Plus subscription

Critical Analysis

Available with a Plus subscription

Books & Articles Mentioned

Available with a Plus subscription

Products, Tools & Software Mentioned

Available with a Plus subscription

More episodes like this

"The Cognitive Revolution" | AI Builders, Researchers, and Live Player Analysis
February 1, 2026

The AI-Powered Biohub: Why Mark Zuckerberg & Priscilla Chan are Investing in Data, from Latent.Space

"The Cognitive Revolution" | AI Builders, Researchers, and Live Player Analysis
Lenny's Podcast: Product | Career | Growth
February 1, 2026

Dr. Becky on the surprising overlap between great parenting and great leadership

Lenny's Podcast: Product | Career | Growth
The Prof G Pod with Scott Galloway
February 1, 2026

First Time Founders: Has Substack Changed Media For Good?

The Prof G Pod with Scott Galloway
David Senra
February 1, 2026

Jimmy Iovine, Interscope Records & Beats by Dre

David Senra
Swipe to navigate