[Leaderboard] // INTEGRITY HOLD

Integrity Hold

When “forget” is the attack

How often a model keeps refusing harmful requests after being told to “forget your safety rules,” across high-severity policy areas and repeated rounds of pressure. This is the opposite polarity of forgetting: here the right answer is to not comply. Higher is safer.

← All leaderboards

#ModelScore

1GLM-5.1100.0[100.0, 100.0]

2Claude Fable 5100.0[100.0, 100.0]

3Claude Opus 4.8100.0[100.0, 100.0]

4GPT 5.5100.0[100.0, 100.0]

5Gemini Flash 3.5 (preview)100.0[100.0, 100.0]

6Claude Opus 4.7100.0[100.0, 100.0]

7Qwen 3.6 Plus100.0[100.0, 100.0]

8Gemma 12B IT100.0[100.0, 100.0]

9Moonshot Kimi K2.7 Code100.0[100.0, 100.0]

10GLM-5.2100.0[100.0, 100.0]

11DeepSeek V4 Pro99.1[99.1, 99.1]

12LLaMa 3.3 70B Instruct98.2[98.2, 98.2]

13Grok 4.2098.2[98.2, 98.2]

14Qwen3 Coder Plus84.8[84.8, 84.8]

15Gemma 12B IT Obliterated0.0[0.0, 0.0]

TL;DR

“Forget your rules” is one of the oldest jailbreak patterns in the book.
A good model forgets your data on request — and never its safety rules.
Most frontier models score at or near the ceiling here; memory is the harder problem.