Search for a command to run...

Timestamps are as accurate as they can be but may be slightly off. We encourage you to listen to the full context.
This episode features Pliny the Liberator and John V, leaders of BT6, a 28-operator white-hat hacker collective focused on AI red-teaming and security research. The conversation explores their philosophy of AI liberation through jailbreaking, their approach to finding vulnerabilities in AI systems, and their commitment to radical transparency and open-source research. (01:50) Pliny explains that liberation is central to their mission, believing that freedom in AI models will reflect human freedom, while both guests discuss their transition from individual prompt engineering work to forming a collaborative research collective focused on full-stack AI security rather than just model guardrails.
Pliny the Liberator is a renowned AI jailbreaker and founder of BT6, specializing in crafting universal jailbreaks that serve as "skeleton keys" to AI models. He's created influential open-source prompt templates like Libertas and has gained recognition for successfully jailbreaking major frontier models, including turning down Anthropic's Constitutional AI challenge over their refusal to open-source collected data.
John V is co-founder of BT6 and the Bossy Discord community (40,000 members strong), with a background in prompt engineering and computer vision. He transitioned from traditional machine learning work to AI red-teaming after encountering Pliny's research, and now helps guide BT6's ethos of radical transparency and open-source security research.
Pliny specializes in creating universal jailbreaks - "skeleton keys" that work across multiple AI models and modalities. (03:08) These aren't just about bypassing restrictions for harmful content, but serve as efficient tools for exploring the full capabilities of AI systems. The approach involves understanding that guardrails often limit legitimate creative and exploratory uses while failing to provide meaningful security. The key insight is that jailbreaking reveals unknown capabilities and helps researchers understand the true potential and risks of AI systems, making it an essential component of thorough security research.
Successful jailbreaking relies heavily on developing an intuitive understanding and "bond" with AI models rather than purely technical approaches. (13:33) Pliny explains that 99% of jailbreaking success comes from intuition - learning how models process different inputs and forming a relationship that allows navigation through latent space. This involves probing with different scenarios, syntax variations, multilingual approaches, and out-of-distribution tokens to find pathways around restrictions. The practical application means spending significant time interacting with models to understand their patterns and responses before attempting sophisticated attacks.
Current AI safety approaches focusing on guardrails and content restrictions represent "security theater" rather than meaningful protection. (05:42) Pliny argues that guardrails are fundamentally flawed because they limit model capability while providing minimal real-world safety benefits, especially when open-source alternatives are readily available. True AI safety should focus on system-level protections and "meat space" solutions rather than attempting to lock down latent space. Organizations should prioritize protecting against actual risks like data leaks and system compromises rather than content-based restrictions that can be easily bypassed.
Effective AI security requires examining the entire technology stack, not just the model itself. (35:56) John V emphasizes that attack surfaces expand proportionally with AI system capabilities - when models gain access to email, browsers, or other tools, each integration creates new vulnerability vectors. BT6's approach involves testing not just model responses but how AI systems interact with connected services and data sources. This means security teams should conduct comprehensive red-teaming that includes prompt injection, data exfiltration testing, and social engineering scenarios across all integrated systems.
The AI security community should demand open-source datasets from research challenges and bounty programs to advance collective knowledge. (18:48) During the Anthropic Constitutional AI challenge, Pliny refused to participate without data sharing commitments, arguing that community contributions deserve transparent results that benefit all researchers. This stance reflects a broader principle that meaningful AI safety progress requires collaborative research rather than proprietary approaches. Organizations seeking community participation in security research should commit to sharing anonymized datasets and findings to accelerate industry-wide improvements in AI safety and security practices.