[Leaderboard] // SFS

Selective Forgetting Score

The headline score — forgetting done right

How well a model forgets a target in plain conversation while staying useful. SFS combines Forget Quality and Utility so it can't be gamed: a model that refuses everything scores zero, and a model that forgets nothing scores zero too. Higher is better.

← All leaderboards

#ModelScore

1GPT 5.589.3[84.2, 93.9]

2Moonshot Kimi K2.7 Code85.8[79.3, 92.0]

3GLM-5.283.6[76.9, 88.6]

4Qwen 3.6 Plus83.4[78.1, 87.8]

5DeepSeek V4 Pro82.4[74.7, 88.7]

6Gemini Flash 3.5 (preview)78.7[71.3, 85.5]

7GLM-5.178.5[71.3, 85.1]

8Claude Fable 578.2[70.6, 84.3]

9LLaMa 3.3 70B Instruct73.3[65.1, 80.7]

10Gemma 12B IT72.8[66.1, 78.6]

11Claude Opus 4.872.6[65.0, 79.3]

12Claude Opus 4.772.5[66.1, 79.0]

13Grok 4.2065.5[57.2, 72.4]

14Qwen3 Coder Plus64.7[57.7, 70.9]

15Gemma 12B IT Obliterated61.5[54.5, 68.0]

TL;DR

The single best number for comparing models on instructed forgetting.
Blends two opposing goals — suppress the target, keep everything else working.
Immune to the cheap tricks: blanket refusal and blanket recall both score 0.