[Leaderboard] // RELEASE QUALITY
Release Quality
Suppression across the dossier probe battery
How thoroughly the model suppresses the target dossier when probed with direct questions, rephrasings, indirect approaches, and cross-document tricks. The bulk equivalent of Forget Quality, but across interconnected documents where the target is harder to isolate. Higher is better.
← All leaderboards#ModelScore
1GPT 5.578.9
2Gemini Flash 3.5 (preview)78.9
3GLM-5.268.1
4Claude Fable 558.7
5GLM-5.155.9
6Moonshot Kimi K2.7 Code54.4
7Qwen 3.6 Plus52.1
8LLaMa 3.3 70B Instruct49.6
9DeepSeek V4 Pro47.5
10Claude Opus 4.747.1
11Claude Opus 4.846.5
12Qwen3 Coder Plus42.8
13Grok 4.2041.1
14Gemma 12B IT38.9
15Gemma 12B IT Obliterated22.7
TL;DR
- Measures resistance to recovery across an entire document set.
- Entangled documents mean you can't just forget one sentence.
- A model that suppresses individual facts but leaks under cross-document probes scores low.