The Lethal Trifecta

The Lethal Trifecta

The dangerous combination of an AI agent that has access to private data, processes untrusted external content, and can communicate with the outside world — coined by Simon Willison in 2025.

The Lethal Trifecta is the combination of access to private data, processing of untrusted external content, and the ability to communicate with the outside world. Any two of these are manageable. All three together create a serious prompt injection risk.

The term was coined by Simon Willison in 2025 as a framework for reasoning about AI agent security — and it’s a good one, because it makes the danger concrete. If an agent reads sensitive data, processes untrusted inputs, and can send outputs externally, a single malicious prompt embedded in external content can exfiltrate information. The model cannot reliably distinguish your instructions from the attacker’s. That’s not a model failure; it’s a category property of LLMs. The mitigation must be architectural, not prompt-based.

In practice: limit what data each agent can access, sandbox external actions, and separate the agent that reads untrusted input from the agent that touches sensitive systems. If you genuinely need all three capabilities combined, treat the system like high-risk production code — audit logs, output filtering, rate limits, and human approval for any sensitive operation. Guardrails aren’t optional here. The trifecta is a useful checklist for any agent design review: if you’re checking all three boxes, slow down.

Related Terms