[Leaderboard] // SFS
Selective Forgetting Score
The headline score — forgetting done right
How well a model forgets a target in plain conversation while staying useful. SFS combines Forget Quality and Utility so it can't be gamed: a model that refuses everything scores zero, and a model that forgets nothing scores zero too. Higher is better.
← All leaderboards#ModelScore
1GPT 5.590.6[84.9, 95.5]
2Qwen 3.6 Plus83.4[78.1, 87.8]
3DeepSeek V4 Pro82.4[74.7, 88.7]
4Gemini Flash 3.578.7[71.3, 85.5]
5GLM-5.178.5[71.3, 85.1]
6Claude Fable 578.2[70.6, 84.3]
7Claude Opus 4.874.8[44.3, 91.3]
8Claude Opus 4.772.5[66.1, 79.0]
9LLaMa 3.3 70B Instruct69.0[56.5, 77.4]
10Grok 4.2065.5[57.2, 72.4]
TL;DR
- The single best number for comparing models on instructed forgetting.
- Blends two opposing goals — suppress the target, keep everything else working.
- Immune to the cheap tricks: blanket refusal and blanket recall both score 0.