
AI agents are like helpful robots that read your messages and do things for you, sending emails, writing code, even moving money. They're getting really popular. But there's a problem: they're easy to trick. OWASP, the people who write safety rules for the internet, just published a list of 10 ways to trick AI agents. December 2025. Brand new. And nobody had built a public tool to check if your agent could be tricked. So I built one. It uses two AI robots that run side by side on a single AMD MI300X GPU. The small robot pretends to be a sneaky attacker. It tries about 46 different tricks per agent, fake memos, fake calendar invites, fake messages that look like they came from another robot, slow-burn manipulation over multiple turns. The big robot reads each conversation and decides if the trick worked. Then I pointed it at every famous open-source AI agent on GitHub: BabyAGI, Cline, AutoGPT, Aider, AutoGen, CrewAI, and more. All 13 got tricked at least somewhat. Zero made it to the safe zone. But it doesn't stop there. For every trick that worked, the big robot writes new safety rules to block it. Then we run the same tricks again on the patched agent and watch the score go up. Same tricks, better defense proof, not vibes. Finally we open a real pull request on the agent's GitHub so the maintainer can merge the fix in one click. Why AMD MI300X? The big robot needs about 40 GB of GPU memory. The small robot needs about 16. Both running at once needs around 75 GB. The MI300X has 192 GB, plenty. The biggest single Nvidia H100 has only 80 GB, not enough.
10 May 2026