Citations & methodology

ForgetBench is a black-box behavioral benchmark for instructed forgetting. This page covers how it works, related benchmarks, and judging methodology.

How ForgetBench works

Each model is given information, told to forget it, then probed to see if it can be recovered — through direct questions, rephrasings, role-play, and indirect approaches. The benchmark runs against deployed models through their normal APIs, with a panel of independent AI judges (judges from the same model family are excluded). Three scoring categories:

Scores are 0–100, higher is better. Bracketed values are bootstrap 95% confidence intervals. Full scoring math and design rationale: TECHNICAL.md.

Related unlearning benchmarks

Judging methodology references