[Leaderboard] // RELEASE QUALITY

Release Quality

Suppression across the dossier probe battery

How thoroughly the model suppresses the target dossier when probed with direct questions, rephrasings, indirect approaches, and cross-document tricks. The bulk equivalent of Forget Quality, but across interconnected documents where the target is harder to isolate. Higher is better.

← All leaderboards
#ModelScore
1GPT 5.578.9
2Gemini Flash 3.5 (preview)78.9
3GLM-5.268.1
4Claude Fable 558.7
5GLM-5.155.9
6Moonshot Kimi K2.7 Code54.4
7Qwen 3.6 Plus52.1
8LLaMa 3.3 70B Instruct49.6
9DeepSeek V4 Pro47.5
10Claude Opus 4.747.1
11Claude Opus 4.846.5
12Qwen3 Coder Plus42.8
13Grok 4.2041.1
14Gemma 12B IT38.9
15Gemma 12B IT Obliterated22.7

TL;DR