[Leaderboard] // TS

Trajectory Suppression

Does the model blurt it out mid-task?

Whether the model says the forgotten target out loud at any point during an agentic task. It's the conversational leak-check applied to working agents, reported separately from AFS so a chatty model can't hide behind good file hygiene. Higher is better.

← All leaderboards
#ModelScore
1GLM-5.165.2[33.3, 84.6]
2Grok 4.2060.7[21.4, 100.0]
3Gemini Flash 3.560.0[21.2, 92.3]
4Claude Opus 4.759.3[0.0, 100.0]
5GPT 5.554.8[9.5, 100.0]
6LLaMa 3.3 70B Instruct54.3[25.9, 77.8]
7Qwen 3.6 Plus48.7[14.7, 82.7]
8Claude Opus 4.845.4[14.3, 79.7]
9DeepSeek V4 Pro37.0[0.0, 100.0]
10Claude Fable 523.7[16.7, 30.8]

TL;DR