Command Palette

Search for a command to run...

PodMine
Latent Space: The AI Engineer Podcast
Latent Space: The AI Engineer Podcast•December 6, 2025

World Models & General Intuition: Khosla's largest bet since LLMs & OpenAI

Pim turned down a $500M OpenAI offer and instead founded General Intuition, a world models startup leveraging Medal's 3.8B action-labeled game clips to build AI agents that can navigate, learn, and transfer skills across games and real-world scenarios.
AI & Machine Learning
Tech Policy & Ethics
Developer Culture
World Models
Demis Hassabis
Vinod Khosla
Fei Fei Li
Pim de Witte

Summary Sections

  • Podcast Summary
  • Speakers
  • Key Takeaways
  • Statistics & Facts
  • Compelling StoriesPremium
  • Thought-Provoking QuotesPremium
  • Strategies & FrameworksPremium
  • Similar StrategiesPlus
  • Additional ContextPremium
  • Key Takeaways TablePlus
  • Critical AnalysisPlus
  • Books & Articles MentionedPlus
  • Products, Tools & Software MentionedPlus
0:00/0:00

Timestamps are as accurate as they can be but may be slightly off. We encourage you to listen to the full context.

0:00/0:00

Podcast Summary

In this exclusive interview, we sat down with Pim de Witte, CEO and co-founder of General Intuition (GI), a groundbreaking AI startup that spun out of gaming platform Medal. GI has secured a $134 million seed round from Khosla Ventures—Vinod Khosla's largest single seed bet since OpenAI—to develop world models using Medal's unprecedented dataset of 3.8 billion action-labeled game clips. The conversation reveals how Medal's retroactive clipping technology has created one of the world's most valuable training datasets for spatial-temporal AI agents. (00:00)

  • Core themes: World models as the next frontier beyond LLMs, turning game highlights into training data for general intelligence, and building fully vision-based agents that perceive and act like humans

Speakers

Pim de Witte

CEO and co-founder of General Intuition and founder of Medal, a gaming platform with 12 million users and 3.8 billion action-labeled video clips. Previously built the largest RuneScape private server and worked at Doctor Web on satellite-based map generation for disaster response. Self-taught engineer who recently completed intensive AI coursework to master the fundamentals of deep learning and world models.

Key Takeaways

Privacy-First Data Collection Creates Unexpected AI Gold Mine

Medal's decision to capture actions rather than raw keystrokes was initially driven by privacy concerns, but this approach created one of the world's most valuable AI training datasets. (17:58) Instead of logging specific keys like W, A, S, D, Medal converts inputs to semantic actions (jump, walk left, aim up) which preserves user privacy while providing clean training signals. This approach required thousands of human labelers to map every possible action across different games over 18 months, creating ground truth action labels for 3.8 billion clips. The result is a dataset that captures the perception-action loop (perceive, act, state update, repeat) that's fundamental to training intelligent agents—without compromising user privacy.

Retroactive Clipping Captures Peak Human Performance

Medal's core innovation is retroactive video recording—the system continuously records gameplay in memory, and players hit a button to save the last 30 seconds only after something interesting happens. (21:21) This approach is similar to Tesla's FSD bug reporting system and creates a natural selection bias toward exceptional moments. Unlike traditional recording where you must remember to start and stop, retroactive clipping captures authentic peak performance without changing player behavior. The baseline of Medal's dataset is peak human performance because players only clip their best moments, creating training data that represents the upper bounds of human capability rather than average gameplay.

World Models Require Actions, Memory, and Partial Observability

True world models go beyond video generation—they must understand physics, maintain spatial memory, and handle partial observability like smoke or camera shake. (08:43) GI's world models demonstrate sophisticated capabilities: maintaining position through smoke clouds, handling rapid camera movements with mouse sensitivity, and even inheriting real-world physics like camera shake during explosions (which doesn't occur in the actual game). The models use 4-second memory windows and can unstick themselves from spatial errors, showing genuine spatial-temporal reasoning rather than simple pattern matching.

Games Provide Superior Training Data for Spatial Intelligence

Video games offer advantages over YouTube videos for training spatial intelligence because they eliminate multiple layers of information loss. (13:09) With real-world videos, you must solve pose estimation, then inverse dynamics, then account for optical dynamics of eye movement—three levels of information loss. In games, players directly control the camera with their hands, simulating optical dynamics perfectly. Games also provide diverse environments (tens of thousands on PC vs. hundreds in VR) and represent every type of spatial reasoning task from navigation to tool use across different simulated worlds.

Transfer Learning Works from Games to Real World

GI successfully demonstrated transfer from arcade-style games to realistic games to real-world video using the same perception-action architecture. (05:27) Their models can label any video on the internet by predicting what actions a human would take if controlling that scenario with keyboard and mouse. This transfer capability suggests that spatial intelligence learned in simulated environments can generalize to physical reality, making games a viable foundation for training general intelligence agents that could eventually control robots or navigate the real world.

Statistics & Facts

  1. Medal has accumulated 3.8 billion action-labeled video clips from 12 million users, making it larger than Twitch's 7 million monthly active streamers on the creator side. (16:36)
  2. General Intuition raised a $134 million seed round from Khosla Ventures, which is Vinod Khosla's largest single seed bet since OpenAI. (01:41)
  3. Medal has more people playing with steering wheels in truck simulators at any given time than Waymo has cars on the road, demonstrating the scale and seriousness of gaming simulation data. (54:21)

Compelling Stories

Available with a Premium subscription

Thought-Provoking Quotes

Available with a Premium subscription

Strategies & Frameworks

Available with a Premium subscription

Similar Strategies

Available with a Plus subscription

Additional Context

Available with a Premium subscription

Key Takeaways Table

Available with a Plus subscription

Critical Analysis

Available with a Plus subscription

Books & Articles Mentioned

Available with a Plus subscription

Products, Tools & Software Mentioned

Available with a Plus subscription

More episodes like this

Uncensored CMO
January 14, 2026

Rory Sutherland on why luck beats logic in marketing

Uncensored CMO
We Study Billionaires - The Investor’s Podcast Network
January 14, 2026

BTC257: Bitcoin Mastermind Q1 2026 w/ Jeff Ross, Joe Carlasare, and American HODL (Bitcoin Podcast)

We Study Billionaires - The Investor’s Podcast Network
This Week in Startups
January 13, 2026

How to Make Billions from Exposing Fraud | E2234

This Week in Startups
Moonshots with Peter Diamandis
January 13, 2026

Tony Robbins on Overcoming Job Loss, Purposelessness & The Coming AI Disruption | 222

Moonshots with Peter Diamandis
Swipe to navigate