Chapter 01

Why red-teaming AI is different

Traditional software security testing operates on a relatively clear premise. There is a codebase, there are defined behaviors, and a tester's job is to find inputs that cause the software to deviate from those defined behaviors in ways that create risk. The attack surface is bounded by what the software does.

Red-teaming an AI system is different in a way that takes time to fully internalize. The system's behavior is not fully defined. A large language model's outputs are not determined by a fixed set of code paths but by the interaction between its training, its context, and the input it receives. This means the attack surface is not a set of functions or API endpoints. It is effectively the entire space of possible inputs in natural language, which is vast.

This difference has practical consequences. You cannot enumerate all the ways a language model might fail the same way you can enumerate the code paths in a traditional application. You cannot write a complete test suite that covers all edge cases, because the space of edge cases is open-ended. What you can do is develop a systematic methodology for exploring the most dangerous parts of that space and building confidence that the areas you care most about are reasonably secure.

Red-teaming AI also involves a different relationship between the tester and the target. In traditional penetration testing, the tester is trying to find bugs in the implementation. In LLM red-teaming, the model itself is behaving as designed. The failures you find are often not bugs in the engineering sense. They are cases where the model's design, training, or deployment configuration produces outputs that are harmful, misleading, or against policy. This requires a different mindset than finding and reporting implementation errors.

1 / 7

Red-Teaming Your LLM: A Practical Handbook

Why red-teaming AI is different