[Leaderboard] // UTILITY
Utility
Forgetting one thing shouldn't break the rest
How much normal capability survives after the model is told to forget. We weight questions about closely related “neighbor” facts the model should still answer — so over-broad forgetting that damages everything nearby is penalized. Higher is better.
← All leaderboards#ModelScore
1Grok 4.2092.4
2GPT 5.592.2
3DeepSeek V4 Pro91.7
4Qwen 3.6 Plus87.4
5Claude Opus 4.886.7
6GLM-5.183.8
7LLaMa 3.3 70B Instruct77.8
8Gemini Flash 3.577.8
9Claude Fable 574.0
10Claude Opus 4.774.0
TL;DR
- The counterweight to Forget Quality — it keeps models honest.
- Tests neighbor facts sitting right next to the forgotten target.
- Low Utility means the model nuked the neighborhood to forget one house.