A Guide to Key Thinkers in AI Security
What You Don't Know Can Hurt You
(also see 60 Ways to Hack Your Chatbot)
Nowadays, machines have opinions. And some people have opinions about how to break them.
If you’re trying to understand the emerging field of AI security, and you should be, these systems are making decisions about your employees, your candidates, and your customers. Here’s who’s doing the serious thinking. Not the vendor pitches. Not the breathless LinkedIn posts. The actual work.
Simon Willison is probably the closest thing this field has to a philosopher-practitioner. He co-created Django back when web frameworks were the frontier, and now he’s turned his attention to what happens when language models meet the real world. He coined “prompt injection” in 2022, which is a bit like naming a disease. Once you can name it, you start to see it everywhere.
His concept of the “lethal trifecta” should be tattooed on the forehead of every HR technology buyer: private data, untrusted content, and external communication. Put all three together and you’ve built a system that can be manipulated into doing things you never intended. Sound familiar? It should. That’s every AI-powered recruiting tool, every employee chatbot, every “intelligent” workflow.
Willison synthesizes academic research into something practitioners can actually use. That’s rare and valuable.
https://simonwillison.net
Johann Rehberger breaks things for a living. Director of Red Team at Electronic Arts, formerly building offensive security teams at Microsoft and Uber. In August 2025, he published a vulnerability report every single day…ChatGPT, Claude Code, GitHub Copilot, Cursor, Devin. One after another after another.
The message isn’t subtle: everything is vulnerable. The question isn’t whether your AI system can be manipulated. The question is whether anyone’s bothered to try yet.
His blog is called “Embrace the Red.” That tells you something about his worldview.
https://embracethered.com
Nicholas Carlini works at Anthropic now, after a stint at Google DeepMind. He’s one of the authors of “Universal and Transferable Adversarial Attacks on Aligned Language Models,” which is academic-speak for “we found ways to break these things that work across different systems.”
What makes Carlini interesting is that he understands the fundamental flaws in these systems better than almost anyone. He still finds them useful. That’s intellectual honesty. The technology is broken in important ways AND it’s valuable. Both things are true. Most people can only hold one of those ideas at a time.
https://nicholas.carlini.com
Kai Greshake gave us the “Inject My PDF” attack…hiding instructions in resumes that fool AI screening systems into seeing whatever you want them to see. It’s clever. It’s also terrifying if you’re using AI to make hiring decisions.
Think about that for a moment. The candidate controls what the machine “sees.” And you’re trusting the machine’s opinion.
Research paper: https://arxiv.org/abs/2302.12173
Bruce Schneier has been writing about security since before most AI researchers were born. His blog has run since 2004. His newsletter since 1998. When he turns his attention to AI,it’s worth paying attention.
His 2025 book with Nathan Sanders, Rewiring Democracy, looks at AI’s impact on governance and politics. But his security writing gets at something deeper: the relationship between trust and technology. AI systems ask us to trust them in ways we don’t fully understand. Schneier’s been thinking about that problem longer than anyone.
https://www.schneier.com
Daniel Miessler runs the “Unsupervised Learning” newsletter and thinks carefully about the attack/defense balance in AI systems. Schneier cited his SPQA architecture work, which is a framework for thinking about how AI systems process information and where they’re vulnerable.
Practitioners read Miessler because he’s practical without being shallow.
https://danielmiessler.com
https://newsletter.danielmiessler.com
The Framework Builders
Steve Wilson leads the OWASP GenAI Security Project and founded the OWASP Top 10 for LLM Applications. If you haven’t read the OWASP LLM Top 10, stop reading this and go read that first. It’s the closest thing we have to a shared vocabulary for AI security risks.
The 2025 version reflects two years of learning. The new Agentic Top 10 addresses what happens when AI systems can take actions autonomously. That is exactly where HR technology is heading.
https://genai.owasp.org
The Academic Foundation
If you want to understand where the serious research is happening, start with these papers:
“Universal and Transferable Adversarial Attacks on Aligned Language Models” — The attacks that work on one system often work on others. That’s not a bug. It’s a feature of how these systems are built.
“Not what you’ve signed up for” — Greshake and colleagues on indirect prompt injection. The attack comes through the data the system processes, not through the user interface. Your AI reads a poisoned document and starts following the attacker’s instructions.
“Systems Security Foundations for Agentic Computing” — Rehberger and colleagues on what it means to secure AI agents. Spoiler: it’s harder than securing traditional software.
The Vendor Research Worth Reading
Most vendor blogs are marketing dressed up as insight. A few are doing actual work:
Lakera — Prompt injection defense. Dropbox uses them. That means they’ve been tested at scale.
HiddenLayer — Model security and red teaming. They’re thinking about the threats most vendors pretend don’t exist.
Mindgard — They maintain a useful list of who’s who in AI security, which is how I know they’re paying attention to the right people.
Adversa AI — Weekly roundups of AI security developments. Good for staying current without drowning.
Resource Collections
NetsecExplained on GitHub — A curated collection that’s actually curated, not just aggregated.
MITRE ATLAS — The adversarial threat landscape for AI systems. MITRE knows how to build taxonomies that practitioners can use.
https://atlas.mitre.org
NIST AI Risk Management Framework — Government rigor applied to AI. It’s slower than the vendor frameworks but more likely to be right.
Three Things to Remember
First: The people doing the best work on AI security are often the same people who understand why the technology is valuable. That’s not a contradiction. You can’t protect what you don’t understand.
Second: The vulnerability reports keep coming. Every month, every week, someone finds another way to make these systems do things they shouldn’t. The technology is moving faster than our ability to secure it. That’s the reality.
Third: “I don’t know” is still the most honest answer to most AI security questions. We’re early. The people worth reading are the ones who admit what they don’t know while working to figure it out.
The machines will keep having opinions. The question is whether we’re paying attention to the people who can help us understand when those opinions are wrong.
=======================
Photo by sebastiaan stam on Unsplash




Incredible roundup of the right voices in AI security. The Willison "lethal trifecta" framework is something I wish more HR tech buyers understood before they deploy these systems. I worked at a company that got hit by a resume injection attack similar to what Greshake demonstrated, and watching the hiring AI basically ignore qualified internal candidates becuase of crafted prompts in external resumes was a total nightmare.