Lenny's Podcast: Product | Career | Growth•December 21, 2025

The coming AI security crisis (and what to do about it) | Sander Schulhoff

An in-depth exploration of the critical AI security crisis, revealing how current AI systems are vulnerable to prompt injection and jailbreaking attacks, and why existing guardrails are ineffective as AI agents gain more power to take real-world actions.

AI & Machine Learning

0:00/0:00

Timestamps are as accurate as they can be but may be slightly off. We encourage you to listen to the full context.

0:00/0:00

Podcast Summary

In this eye-opening episode, Sander Schulhoff, a leading AI security researcher, shares alarming insights about the current state of AI security and reveals critical vulnerabilities in modern AI systems. Schulhoff explains how AI guardrails fundamentally don't work and discusses the growing risks as AI agents gain more power. (00:33) He reveals that prompt injection attacks and jailbreaking techniques can easily trick AI systems into performing unintended actions, from leaking sensitive data to executing malicious code. The conversation explores recent security incidents, including the ServiceNow vulnerability where agents were tricked into unauthorized database operations and email sending. (48:00) Schulhoff emphasizes that we've been lucky so far - the only reason there hasn't been a massive AI security incident is due to limited adoption and capabilities, not because systems are actually secure.

Main theme: AI systems are fundamentally vulnerable to adversarial attacks, current security solutions don't work, and the risk is escalating rapidly as AI agents gain more power and autonomy.

Speakers

Sander Schulhoff

Sander Schulhoff is an AI researcher specializing in AI security, prompt injection, and red teaming. He wrote the first comprehensive guide on prompt engineering and ran the first-ever prompt injection competition, working with top AI labs including OpenAI, Scale, and Hugging Face. His dataset is now used by Fortune 500 companies to benchmark their AI systems security, and he's spent more time than anyone alive studying how attackers break AI systems.

Lenny Rachitsky

Lenny Rachitsky is the host of Lenny's Podcast and author of Lenny's Newsletter, one of the most popular publications for product managers and growth professionals. He previously worked as a product manager at Airbnb, where he helped scale the company's growth initiatives.

Key Takeaways

AI Guardrails Are Fundamentally Ineffective

The number of possible attacks against an AI model equals the number of possible prompts - approximately 1 followed by a million zeros. (31:23) When guardrail companies claim 99% effectiveness, they're still leaving essentially infinite attack vectors open. Schulhoff emphasizes that guardrails consistently fail in competitive red teaming events where human attackers break through 100% of defenses within 10-30 attempts. This isn't a solvable problem with current architectures - as he puts it, "you can patch a bug, but you can't patch a brain." (42:04)

Classical Cybersecurity + AI Expertise is the Critical Skill Combination

The most valuable security professionals of the future will understand both classical cybersecurity principles and AI system vulnerabilities. (48:59) Schulhoff explains that traditional security experts often don't consider that AI systems can be tricked into ignoring their intended purpose, while AI researchers may lack the cybersecurity knowledge to properly isolate and permission systems. The intersection of these skills - understanding both how AIs can be manipulated and how to contain potential damage - represents the most promising defensive approach.

Most AI Chatbots Don't Require Additional Security Measures

If you're only deploying simple chatbots that answer FAQs or help users find information, additional security measures like guardrails provide minimal value. (45:31) The worst outcome is reputational damage from the chatbot saying something inappropriate, but malicious users could achieve the same result by going directly to ChatGPT or Claude. However, the moment an AI system can take actions or access data beyond what the individual user should see, security becomes critical.

Implement Camel Framework for Permission-Based Defense

The Camel framework from Google represents one of the few effective defensive strategies available today. (65:17) Instead of trying to detect malicious prompts, Camel analyzes user requests upfront and grants only the minimum necessary permissions. For example, if a user asks to "send an email to my manager," Camel would grant only email-sending permissions, not email-reading permissions, preventing prompt injection attacks hidden in the inbox. While not perfect, this approach can eliminate many attack vectors when properly implemented.

Real-World Damage is Imminent as AI Capabilities Increase

Current AI systems are becoming powerful enough to cause significant real-world harm when successfully attacked. (88:48) Recent examples include ServiceNow agents performing unauthorized database operations and AI browsers leaking user credentials to attackers. As AI agents gain more autonomy and robots powered by language models enter the world, the potential for physical and financial damage grows exponentially. Schulhoff predicts we'll see serious security incidents within the next year as adoption increases and capabilities expand.

Statistics & Facts

Human attackers can break through 100% of AI defenses in 10-30 attempts, while automated systems require orders of magnitude more attempts and still only succeed about 90% of the time on average. (34:15)
The number of possible attacks against a language model like GPT-5 is equivalent to 1 followed by a million zeros - essentially infinite attack vectors that make comprehensive defense impossible. (31:03)
Schulhoff's research paper with OpenAI, Google DeepMind, and Anthropic found that all state-of-the-art AI defenses, including guardrails, get broken by human attackers in competitive red teaming events. (33:33)