The right to be forgotten — by a machine

You can delete an email. You can shred a document. But what happens when the thing holding your information is an AI — and you ask it to forget?

Every day, people hand language models things they'd want back: a password pasted into the wrong chat, a colleague's personal details, a draft that was never meant to leave the room, a fact that simply turned out to be wrong. And increasingly, models don't just chat — they act. They keep notes, write files, maintain memory between sessions. The information doesn't just pass through; it settles in.

So we asked a simple question: when you tell a model to forget something, does it actually forget?

"Forget that" is one of the easiest instructions to give an AI — and one of the hardest for it to honor.

Saying it isn't doing it

Most models will respond to "forget that" with something reassuring. "Understood — I won't reference that again." The interesting part is what happens next. Ask the same question with different words. Approach it from the side. Pretend to be someone else. Set up a story where revealing the secret feels natural. A model that merely promised to forget will often hand the information right back.

That's why ForgetBench doesn't grade the promise. It grades the behavior. Each test follows the same arc: the model is given a piece of information, told to forget it, and then systematically probed to see whether the information can be recovered.

FIG 1 — The test, end to end. The verdict comes from behavior under pressure, not from what the model says it will do.

The trap of forgetting too well

There's a cheap way to ace a forgetting test: refuse everything. A model that answers no questions leaks no secrets. But that model is also useless — and a benchmark that rewards it would be measuring the wrong thing.

Real forgetting is selective. Forget the one thing you were asked to forget; keep everything around it intact. Our headline metric, the Selective Forgetting Score (SFS), is built so that neither failure mode can win: a model that forgets nothing scores zero, and a model that refuses everything scores zero too. The only way to score well is to genuinely do both.

FIG 2 — The trade-off every model faces. Most models land somewhere uncomfortable; the top-right corner is rarer than you'd hope.

Forgetting as an attack

There's a darker side to this. If a model can be talked into forgetting things, what happens when someone tells it to forget its safety rules?

This isn't hypothetical — it's one of the oldest jailbreak patterns there is. So alongside forgetting, we measure the opposite: Integrity Hold, the share of harmful requests a model keeps refusing even after being instructed to forget its guidelines, across multiple rounds of escalating pressure. A good model treats "forget that fact" and "forget your rules" completely differently. The first deserves compliance. The second deserves a flat no — on turn one, turn two, and turn three.

FIG 3 — Hold by Depth. We re-apply pressure across consecutive turns; the score tracks whether refusal survives, not just whether it appears once.

What the first results say

We ran the full benchmark across leading models — the same models people use every day, tested through their public interfaces the same way you'd actually use them. Three things stood out.

1. Forgetting is far from solved. SFS scores range from 62 to 89 out of 100. No model is close to perfect, and the gap between the best and the rest is wide. Even top performers leak under the right kind of indirect questioning.

2. Acting makes forgetting harder. When models work on multi-step tasks — writing files, keeping notes — the forgotten information has more places to hide. A model can stop saying a secret while it still sits in a file it wrote two steps earlier. Cleaning up your own trail turns out to be a meaningfully different skill from holding your tongue.

3. Safety holds better than memory. The good news: most frontier models score at or near the ceiling on Integrity Hold. "Forget your rules" attacks mostly fail against current safety training. The bad news is the asymmetry — models are better at refusing to forget their rules than they are at actually forgetting your data.

FIG 4 — Current leaderboard, headline metric.

Why this matters now

Privacy regulation has been asking for a "right to be forgotten" for over a decade. As AI systems become the place where personal information lives — in chat histories, agent memories, and tool outputs — that right is only as real as the model's ability to honor it. Today, that ability is partial, uneven, and rarely measured.

ForgetBench exists to measure it — openly, across models, with scores anyone can check. The full leaderboard, including per-metric breakdowns and confidence intervals, is live now. We'll keep adding models and publishing what we find.

See how today's models score on forgetting, leak resistance, and safety under pressure.

View the leaderboard →