Search for a command to run...

Timestamps are as accurate as they can be but may be slightly off. We encourage you to listen to the full context.
In this episode of the MAD Podcast, host Matt Turck explores two opposing perspectives on AGI with Tim Dettmers from the Allen Institute for AI and Dan Fu from Together AI. (01:08) Tim argues in his provocative essay "Why AGI Will Not Happen" that we're hitting fundamental physical constraints and diminishing returns in computation, particularly citing memory movement bottlenecks and hardware limitations. (08:20) Meanwhile, Dan counters with his essay "Yes, AGI Will Happen," contending that current models are severely underutilizing available hardware and are lagging indicators of computational progress. (16:16) The conversation then shifts to practical applications, with both experts agreeing that AI agents have already reached a critical threshold for transforming software engineering and knowledge work. (32:12) They discuss the "software singularity" where coding agents can now tackle even the most complex programming challenges, and emphasize that professionals who don't adapt to using agents effectively will be left behind.
Tim is an Assistant Professor at Carnegie Mellon University in the Machine Learning and Computer Science departments and a Research Scientist at the Allen Institute for AI. He's renowned for his pioneering work in efficient deep learning and quantization, including the development of QLoRA, a breakthrough method for efficient fine-tuning that uses up to 16 times less memory than traditional approaches. He previously worked three years in Germany's automation industry before transitioning to AI research.
Dan is an Assistant Professor at UC San Diego and VP of Kernels at Together AI, where he focuses on making AI models run faster through specialized GPU programming. During his PhD, he developed FlashAttention, a crucial optimization for transformer models, and researched alternative architectures like state-space models. At Together AI, he leads efforts to maximize hardware utilization and recently collaborated with Cursor to accelerate their models for the launch of Composer 2.0 on NVIDIA's Blackwell GPUs.
Tim argues that computational progress faces fundamental physical barriers, particularly the von Neumann bottleneck that limits how efficiently data can be moved between memory and processors. (13:01) He explains that useful computation requires two key components: gathering data from different locations and transforming it into new information. The geometric constraints of moving information from large, slow memory (DRAM) to fast processing units create unavoidable latency issues. Modern optimizations like stacked HBM memory and quantization to 4-bit precision have reached their practical limits, with manufacturing yields becoming prohibitively difficult and no new breakthrough technologies on the horizon to overcome these bottlenecks.
Dan presents compelling evidence that today's best models operate at only 20% hardware utilization (MFU - Model Flop Utilization), compared to 50-60% achieved in earlier 2020s training runs. (18:05) He points to DeepSeek's model, trained on 2,000 nerfed H800 GPUs for about a month in 2024, as an example of this massive underutilization. Since then, companies like Poolside and Reflection have built clusters with tens of thousands of next-generation B200 chips that are 2-3x faster, creating potential for up to 100x more available compute when combined with optimization improvements.
The models we interact with today were trained on hardware clusters built 1.5-2 years ago, creating a significant gap between current computational capabilities and deployed AI systems. (22:01) This lag occurs because large pre-training runs require substantial time for cluster setup, training execution, and post-training refinement including RLHF. As Dan explains, even OpenAI's GPT-4.5 Turbo was only partially trained on newer hardware, with most pre-training occurring on older H100 clusters while newer GB200 chips were used primarily for fine-tuning phases.
Dan describes his pivotal moment in June 2025 when AI agents became capable of writing GPU kernels - traditionally considered the "final boss" of programming challenges. (33:21) These highly specialized, parallel programs written in C++ typically required expert-level skills and weeks of development time. With agent assistance, Dan's team accomplished what previously took months in single days, achieving 5-10x productivity improvements. This breakthrough suggests agents have reached a level where they can accelerate even the most technically demanding programming work when guided by domain experts.
Both experts emphasize that using agents effectively requires treating them like junior team members who need clear context, task decomposition, and expert oversight. (44:02) Dan compares agent management to onboarding new interns - you wouldn't ask them to double company revenue, but with proper guidance and tools, they can be highly productive. Tim adds that 90% of code and text should be written by agents, but the critical 10% of human review and editing makes the difference between mediocre and excellent output. The key insight: agents amplify existing expertise rather than replace the need for deep domain knowledge.