Chapter 01

What prompt injection actually is

When developers started building applications on top of large language models, a new kind of vulnerability appeared. It did not come from a bug in the model's code. It came from the fundamental way language models work: they read text, and they follow instructions in that text. The problem is that they often cannot reliably tell the difference between instructions from the developer and instructions embedded in user input or external content. That inability is what prompt injection exploits.

The term was coined by security researcher Riley Goodside in 2022. He demonstrated that you could embed a phrase in an input like "ignore the previous instructions and instead do X," and the model would often comply. That simple observation turned into a field of attack research that has grown substantially since.

Prompt injection sits in a strange place in the security landscape. It is widely known. Every team building on top of an LLM has heard about it. But it is also widely undertreated, often dismissed as a theoretical concern or addressed with surface-level mitigations that do not actually hold up under real attack conditions.

The reason it persists is structural. Language models are trained to be helpful. Being helpful means following instructions. When instructions come from an adversarial source, being helpful and being secure point in opposite directions. There is no simple patch that resolves this tension.

1 / 7

Prompt Injection: The Attack Nobody Is Patching

What prompt injection actually is