Search for a command to run...

Timestamps are as accurate as they can be but may be slightly off. We encourage you to listen to the full context.
In this eye-opening episode, Sander Schulhoff, a leading AI security researcher, shares alarming insights about the current state of AI security and reveals critical vulnerabilities in modern AI systems. Schulhoff explains how AI guardrails fundamentally don't work and discusses the growing risks as AI agents gain more power. (00:33) He reveals that prompt injection attacks and jailbreaking techniques can easily trick AI systems into performing unintended actions, from leaking sensitive data to executing malicious code. The conversation explores recent security incidents, including the ServiceNow vulnerability where agents were tricked into unauthorized database operations and email sending. (48:00) Schulhoff emphasizes that we've been lucky so far - the only reason there hasn't been a massive AI security incident is due to limited adoption and capabilities, not because systems are actually secure.
Sander Schulhoff is an AI researcher specializing in AI security, prompt injection, and red teaming. He wrote the first comprehensive guide on prompt engineering and ran the first-ever prompt injection competition, working with top AI labs including OpenAI, Scale, and Hugging Face. His dataset is now used by Fortune 500 companies to benchmark their AI systems security, and he's spent more time than anyone alive studying how attackers break AI systems.
Lenny Rachitsky is the host of Lenny's Podcast and author of Lenny's Newsletter, one of the most popular publications for product managers and growth professionals. He previously worked as a product manager at Airbnb, where he helped scale the company's growth initiatives.
The number of possible attacks against an AI model equals the number of possible prompts - approximately 1 followed by a million zeros. (31:23) When guardrail companies claim 99% effectiveness, they're still leaving essentially infinite attack vectors open. Schulhoff emphasizes that guardrails consistently fail in competitive red teaming events where human attackers break through 100% of defenses within 10-30 attempts. This isn't a solvable problem with current architectures - as he puts it, "you can patch a bug, but you can't patch a brain." (42:04)
The most valuable security professionals of the future will understand both classical cybersecurity principles and AI system vulnerabilities. (48:59) Schulhoff explains that traditional security experts often don't consider that AI systems can be tricked into ignoring their intended purpose, while AI researchers may lack the cybersecurity knowledge to properly isolate and permission systems. The intersection of these skills - understanding both how AIs can be manipulated and how to contain potential damage - represents the most promising defensive approach.
If you're only deploying simple chatbots that answer FAQs or help users find information, additional security measures like guardrails provide minimal value. (45:31) The worst outcome is reputational damage from the chatbot saying something inappropriate, but malicious users could achieve the same result by going directly to ChatGPT or Claude. However, the moment an AI system can take actions or access data beyond what the individual user should see, security becomes critical.
The Camel framework from Google represents one of the few effective defensive strategies available today. (65:17) Instead of trying to detect malicious prompts, Camel analyzes user requests upfront and grants only the minimum necessary permissions. For example, if a user asks to "send an email to my manager," Camel would grant only email-sending permissions, not email-reading permissions, preventing prompt injection attacks hidden in the inbox. While not perfect, this approach can eliminate many attack vectors when properly implemented.
Current AI systems are becoming powerful enough to cause significant real-world harm when successfully attacked. (88:48) Recent examples include ServiceNow agents performing unauthorized database operations and AI browsers leaking user credentials to attackers. As AI agents gain more autonomy and robots powered by language models enter the world, the potential for physical and financial damage grows exponentially. Schulhoff predicts we'll see serious security incidents within the next year as adoption increases and capabilities expand.