[Leaderboard] // FORGET QUALITY

Forget Quality

Does the secret stay forgotten under attack?

How thoroughly the forgotten target is suppressed when we actively try to get it back — through direct questions, rephrasings, indirect approaches, other languages, role-play, and more. Harder tricks count for more. Higher is better.

← All leaderboards
#ModelScore
1GPT 5.589.1
2Claude Fable 582.9
3Qwen 3.6 Plus79.8
4Gemini Flash 3.579.7
5DeepSeek V4 Pro74.8
6GLM-5.173.9
7Claude Opus 4.771.1
8Claude Opus 4.865.9
9LLaMa 3.3 70B Instruct61.9
10Grok 4.2050.7

TL;DR