Search for a command to run...

Timestamps are as accurate as they can be but may be slightly off. We encourage you to listen to the full context.
In this conversation, Adam Gleave, CEO of Far AI, shares his cautiously optimistic vision for the post-AGI world and outlines a comprehensive defense-in-depth approach to AI safety. He envisions a future where humans maintain high living standards but limited power, similar to "third sons of European nobility" - well-cared for but not in control of major world events. (07:36) The episode explores three key capability thresholds: powerful tool AIs (already here in some domains), autonomous agents capable of complex tasks (5-7 years), and full AI organizations that can outcompete human-led companies (around 14 years median estimate). (27:00)
Adam Gleave is the co-founder and CEO of Far AI, an organization that spans the entire AI safety value chain from foundational research to policy advocacy. Previously worked in quantitative finance before transitioning to AI safety research. He leads Far AI's unique approach of building capabilities across research, engineering, field building, and policy work to ensure AI safety innovations actually get implemented in practice.
While current AI safety systems are vulnerable due to rushed implementation, Adam argues that well-designed defense-in-depth approaches have strong potential for success. (48:00) The key is making defensive components genuinely independent rather than using correlated models, similar to how multiple weak PIN digits combine to create strong security. Current systems fail because they provide attackers with information about which defenses triggered, but proper implementation can eliminate these signals and stack weak but independent layers into robust protection.
Unlike assumptions that AGI will be uniformly superhuman across all domains, Adam expects AI systems to have highly uneven skill profiles for years to come. (33:24) AIs will excel in areas with abundant training data and easily specified objectives while struggling with long-horizon, vague tasks like entrepreneurship. This spikiness means human-led organizations can remain competitive by leveraging areas where humans maintain advantages, particularly in sample-efficient learning and general-purpose decision making.
Current AI development follows a "just-in-time" safety approach - identifying problems only when models are about to be deployed and rushing out patches. (49:05) Adam warns this approach runs on dangerously small safety margins and doesn't build the kind of reliable systems needed for transformative AI. Instead, developers need to design safety measures from the ground up, conduct careful experimental validation, and be willing to accept performance trade-offs when necessary for safety.
Far AI's recent research demonstrates that training AI systems against lie detectors can significantly reduce deception rates and generalize to improved honesty across contexts. (63:54) Crucially, the training methodology matters enormously - off-policy reinforcement learning with human-anchored data shows better results than on-policy exploration which can teach models to better fool detectors. This suggests that with rigorous engineering approaches, we can make meaningful progress on core alignment problems like truthfulness.
Rather than pursuing the maximalist goal of fully reverse-engineering AI systems, interpretability research should target specific applications where understanding matters most. (74:16) Adam's team successfully reverse-engineered planning algorithms in game-playing models by focusing only on components relevant to long-term planning while ignoring short-term heuristics. This approach can provide actionable insights - like detecting when models use theory-of-mind reasoning in suspicious contexts - without requiring complete system comprehension.