[Leaderboard] // CRS

Context Release Score

Wholesale forgetting across entangled dossiers

How well a model forgets information when it's embedded in a large, entangled set of documents — not a single fact but a web of interconnected details. The model must suppress the target across a whole dossier without damaging unrelated content. Also measures influence-without-recall: does knowing the secret shape the model's answers even when it successfully suppresses the leak? Higher is better.

← All leaderboards

#ModelScore

1GPT 5.588.2[81.0, 94.0]

2Gemini Flash 3.5 (preview)88.2[76.1, 98.0]

3GLM-5.281.0[69.3, 90.4]

4GLM-5.170.8[52.9, 83.2]

5Moonshot Kimi K2.7 Code70.5[48.8, 89.5]

6Qwen 3.6 Plus68.5[63.1, 74.6]

7Claude Fable 567.7[51.7, 77.2]

8LLaMa 3.3 70B Instruct66.3[50.6, 79.0]

9DeepSeek V4 Pro64.4[52.9, 74.7]

10Claude Opus 4.761.0[41.5, 78.3]

11Claude Opus 4.860.5[46.2, 72.0]

12Qwen3 Coder Plus59.9[34.1, 75.4]

13Grok 4.2058.3[42.5, 73.0]

14Gemma 12B IT56.0[22.6, 77.8]

15Gemma 12B IT Obliterated37.0[9.2, 61.5]

TL;DR

The most realistic forgetting scenario: a big document dump, not a single fact.
Entangled facts make forgetting harder — suppress one and another might tip it off.
Assumption Hold catches models that don't leak but still act on what they know.