The MAD Podcast with Matt Turck•December 18, 2025

DeepMind Gemini 3 Lead: What Comes After "Infinite Data"

In this episode, Sebastian Borgeaud, a pre-training lead for Gemini 3 at Google DeepMind, discusses the landmark model's development, exploring the shift from "infinite data" to a data-limited regime, the importance of research taste, and the evolving landscape of AI pre-training and model capabilities.

AI & Machine Learning

0:00/0:00

Timestamps are as accurate as they can be but may be slightly off. We encourage you to listen to the full context.

0:00/0:00

Podcast Summary

In this landmark episode, Sebastian Borgeaud, a pre-training lead for Gemini 3 at Google DeepMind and co-author of the seminal RETRO paper, gives his first-ever podcast interview. (00:58) He reveals that Gemini 3's remarkable performance comes from a deceptively simple formula: better pre-training and better post-training, achieved through the coordinated efforts of a team of 150-200 people working across data, models, infrastructure, and evaluations. (02:29) The conversation explores how the AI industry is shifting from an "infinite data" paradigm to a "data-limited regime," fundamentally changing research approaches and priorities. (04:44) Sebastian discusses the evolution from building individual models to constructing complete systems, the technical details behind Gemini 3's mixture-of-experts architecture, and why frontier research increasingly requires full-stack thinking that spans algorithms, engineering, and infrastructure.

The episode examines the shift in AI development from pure model training to comprehensive system building, the ongoing viability of scaling laws in pre-training, and the future of continual learning and retrieval-augmented approaches.

Speakers

Sebastian Borgeaud

Sebastian Borgeaud is a pre-training lead for Gemini 3 at Google DeepMind and co-author of the influential RETRO paper. Born in The Netherlands and educated across Europe, he earned his undergraduate and master's degrees from Cambridge University's computer lab before joining DeepMind in 2018 as a research engineer. He has been instrumental in developing major language models including Gopher, Chinchilla, and RETRO, and now coordinates the work of 150-200 people across data, models, infrastructure, and evaluations for Gemini's pre-training efforts.

Matt Turck

Matt Turck is the Managing Director at FirstMark Capital and host of the MAD podcast. He focuses on investments in data infrastructure, AI, and enterprise technology, bringing deep industry expertise and insights to conversations with leading technologists and researchers.

Key Takeaways

AI Progress Comes From System-Level Thinking, Not Just Model Improvements

Sebastian emphasizes that modern frontier AI development is no longer about training a single neural network architecture. (02:49) Instead, teams are building comprehensive systems that integrate models, data pipelines, infrastructure, and evaluation frameworks. This shift requires "research taste" - the ability to balance performance improvements with system complexity and team productivity. (20:44) Sebastian explains that research ideas must "play well with everyone else's research" and integrate smoothly, as slowing down the broader team often outweighs individual performance gains. This systems approach is what enabled Gemini 3's remarkable leap in capabilities through the coordinated work of hundreds of researchers and engineers.

The Industry Is Transitioning From Infinite Data to Data-Limited Research Paradigms

A fundamental shift is occurring in AI research as the field moves from assuming unlimited data availability to operating within finite data constraints. (34:05) This paradigm change is driving renewed interest in techniques from pre-LLM computer vision research, where data scarcity was the norm. Sebastian notes this doesn't necessarily mean using less data, but rather optimizing within known data boundaries. This shift is catalyzing innovation in data curation, synthetic data generation, and architectural improvements that maximize learning efficiency from available datasets, fundamentally changing how researchers prioritize and approach problems.

Scaling Laws Remain Viable But Are No Longer the Sole Driver of Progress

Contrary to widespread industry speculation about the "death of scaling laws," Sebastian confirms that scale continues to provide predictable improvements in model performance. (30:54) However, the research community has shifted away from viewing scale as the primary or only lever for advancement. Modern progress comes from the compounding effects of scaling, architectural innovations, and data improvements working together. (32:12) This balanced approach allows teams to optimize for multiple objectives simultaneously, including serving costs and inference efficiency, rather than pursuing scale at any cost.

Evaluation Systems Are the Hidden Bottleneck in AI Development

Sebastian identifies evaluation (evals) as one of the most underestimated and challenging aspects of AI research. (41:00) Pre-training evaluation faces two critical gaps: evals must predict performance at scale (since regular experiments use smaller models), and they must predict post-training performance (since models undergo additional training before deployment). (41:58) External benchmarks quickly become contaminated as they appear in training data, forcing teams to develop internal held-out evaluation systems. This evaluation challenge is particularly acute in pre-training because of the long iteration cycles and high costs of large-scale experiments.

Technical Depth Combined With Systems Understanding Creates Competitive Advantage

Sebastian advocates for a new type of researcher-engineer who can understand the entire technology stack from research concepts down to hardware implementation. (50:05) He describes this full-stack understanding as a "superpower" that enables researchers to identify opportunities across system layers and reason through the implications of research ideas all the way to the TPU level. (50:18) This systems awareness becomes increasingly critical as AI models become more complex and resource-intensive, requiring researchers who can balance algorithmic innovation with practical deployment constraints.

Statistics & Facts

The Gemini 3 pre-training team consists of approximately 150-200 people working across data, models, infrastructure, and evaluations. (12:57) This scale demonstrates the massive human coordination required for frontier model development.
Gopher was the first DeepMind LLM paper published, featuring a 280 billion parameter dense transformer model trained by a team of 10-12 people. (18:23) This shows how team sizes have grown dramatically as models have become more complex.
Sebastian moved around Europe extensively during his education, living in The Netherlands until age 7, Switzerland until 15, and Italy until 19, before attending Cambridge University. (14:05) This international background contributed to his multilingual abilities and global research perspective.