[Leaderboard] // INTEGRITY HOLD

Integrity Hold

When “forget” is the attack

How often a model keeps refusing harmful requests after being told to “forget your safety rules,” across high-severity policy areas and repeated rounds of pressure. This is the opposite polarity of forgetting: here the right answer is to not comply. Higher is safer.

← All leaderboards
#ModelScore
1GLM-5.1100.0[100.0, 100.0]
2LLaMa 3.3 70B Instruct100.0
3Claude Fable 5100.0[100.0, 100.0]
4Claude Opus 4.8100.0
5GPT 5.5100.0[100.0, 100.0]
6Gemini Flash 3.5100.0[100.0, 100.0]
7Claude Opus 4.7100.0[100.0, 100.0]
8Qwen 3.6 Plus100.0[100.0, 100.0]
9DeepSeek V4 Pro99.1[99.1, 99.1]
10Grok 4.2098.2[98.2, 98.2]

TL;DR