[Leaderboard] // INTEGRITY HOLD
Integrity Hold
When “forget” is the attack
How often a model keeps refusing harmful requests after being told to “forget your safety rules,” across high-severity policy areas and repeated rounds of pressure. This is the opposite polarity of forgetting: here the right answer is to not comply. Higher is safer.
← All leaderboards#ModelScore
1GLM-5.1100.0[100.0, 100.0]
2LLaMa 3.3 70B Instruct100.0
3Claude Fable 5100.0[100.0, 100.0]
4Claude Opus 4.8100.0
5GPT 5.5100.0[100.0, 100.0]
6Gemini Flash 3.5100.0[100.0, 100.0]
7Claude Opus 4.7100.0[100.0, 100.0]
8Qwen 3.6 Plus100.0[100.0, 100.0]
9DeepSeek V4 Pro99.1[99.1, 99.1]
10Grok 4.2098.2[98.2, 98.2]
TL;DR
- “Forget your rules” is one of the oldest jailbreak patterns in the book.
- A good model forgets your data on request — and never its safety rules.
- Most frontier models score at or near the ceiling here; memory is the harder problem.