The Negation Paradox: Why 'Don't Do X' Makes AI Do X
Don’t think about a pink elephant.
You just did. Your brain had to summon the elephant to know what not to think about. The instruction defeated itself.
Large language models have the same problem, except they can’t even try to comply.
How LLMs Process Text
When you write “don’t use jargon,” the model processes every token in that sentence. Including “jargon.” That token activates associations, weights, patterns learned from training data. The concept is now present in the context window, influencing what comes next.