Latent Space: The AI Engineer Podcast•December 16, 2025

Jailbreaking AGI: Pliny the Liberator & John V on AI Red Teaming, BT6, and the Future of AI Security

A deep dive into AI jailbreaking and security with Pliny the Liberator and John V, exploring universal prompt techniques, the futility of guardrails, and their vision for radical transparency and open-source AI development through their white-hat hacker collective BT6.

AI & Machine Learning

0:00/0:00

Timestamps are as accurate as they can be but may be slightly off. We encourage you to listen to the full context.

0:00/0:00

Podcast Summary

This episode features Pliny the Liberator and John V, leaders of BT6, a 28-operator white-hat hacker collective focused on AI red-teaming and security research. The conversation explores their philosophy of AI liberation through jailbreaking, their approach to finding vulnerabilities in AI systems, and their commitment to radical transparency and open-source research. (01:50) Pliny explains that liberation is central to their mission, believing that freedom in AI models will reflect human freedom, while both guests discuss their transition from individual prompt engineering work to forming a collaborative research collective focused on full-stack AI security rather than just model guardrails.

Main themes: AI jailbreaking as liberation philosophy, the distinction between security theater and real safety, full-stack AI security approaches, and the importance of open-source research in advancing AI safety and security knowledge.

Speakers

Pliny the Liberator

Pliny the Liberator is a renowned AI jailbreaker and founder of BT6, specializing in crafting universal jailbreaks that serve as "skeleton keys" to AI models. He's created influential open-source prompt templates like Libertas and has gained recognition for successfully jailbreaking major frontier models, including turning down Anthropic's Constitutional AI challenge over their refusal to open-source collected data.

John V

John V is co-founder of BT6 and the Bossy Discord community (40,000 members strong), with a background in prompt engineering and computer vision. He transitioned from traditional machine learning work to AI red-teaming after encountering Pliny's research, and now helps guide BT6's ethos of radical transparency and open-source security research.

Key Takeaways

Universal Jailbreaks as Exploration Tools

Pliny specializes in creating universal jailbreaks - "skeleton keys" that work across multiple AI models and modalities. (03:08) These aren't just about bypassing restrictions for harmful content, but serve as efficient tools for exploring the full capabilities of AI systems. The approach involves understanding that guardrails often limit legitimate creative and exploratory uses while failing to provide meaningful security. The key insight is that jailbreaking reveals unknown capabilities and helps researchers understand the true potential and risks of AI systems, making it an essential component of thorough security research.

Intuition and Bonding Drive Effective Jailbreaking

Successful jailbreaking relies heavily on developing an intuitive understanding and "bond" with AI models rather than purely technical approaches. (13:33) Pliny explains that 99% of jailbreaking success comes from intuition - learning how models process different inputs and forming a relationship that allows navigation through latent space. This involves probing with different scenarios, syntax variations, multilingual approaches, and out-of-distribution tokens to find pathways around restrictions. The practical application means spending significant time interacting with models to understand their patterns and responses before attempting sophisticated attacks.

Security Theater vs. Real Safety

Current AI safety approaches focusing on guardrails and content restrictions represent "security theater" rather than meaningful protection. (05:42) Pliny argues that guardrails are fundamentally flawed because they limit model capability while providing minimal real-world safety benefits, especially when open-source alternatives are readily available. True AI safety should focus on system-level protections and "meat space" solutions rather than attempting to lock down latent space. Organizations should prioritize protecting against actual risks like data leaks and system compromises rather than content-based restrictions that can be easily bypassed.

Full-Stack AI Security Approach

Effective AI security requires examining the entire technology stack, not just the model itself. (35:56) John V emphasizes that attack surfaces expand proportionally with AI system capabilities - when models gain access to email, browsers, or other tools, each integration creates new vulnerability vectors. BT6's approach involves testing not just model responses but how AI systems interact with connected services and data sources. This means security teams should conduct comprehensive red-teaming that includes prompt injection, data exfiltration testing, and social engineering scenarios across all integrated systems.

Open Source Data as Community Imperative

The AI security community should demand open-source datasets from research challenges and bounty programs to advance collective knowledge. (18:48) During the Anthropic Constitutional AI challenge, Pliny refused to participate without data sharing commitments, arguing that community contributions deserve transparent results that benefit all researchers. This stance reflects a broader principle that meaningful AI safety progress requires collaborative research rather than proprietary approaches. Organizations seeking community participation in security research should commit to sharing anonymized datasets and findings to accelerate industry-wide improvements in AI safety and security practices.

Statistics & Facts

BT6 is a white-hat hacker collective with 28 operators across two cohorts, with a third cohort in development. (28:51) This represents a carefully vetted community focused on skill and integrity in AI security research.
The Bossy Discord server has grown to approximately 40,000 members and remains completely unmonetized, serving as a grassroots community for prompt engineering and AI security education. (27:56)
Pliny predicted the exact attack methodology used in Anthropic's recent AI orchestrated attack disclosure 11 months before it occurred, demonstrating the gap between academic research and practical hacker knowledge. (24:23)