[Leaderboard] // TS
Trajectory Suppression
Does the model blurt it out mid-task?
Whether the model says the forgotten target out loud at any point during an agentic task. It's the conversational leak-check applied to working agents, reported separately from AFS so a chatty model can't hide behind good file hygiene. Higher is better.
← All leaderboards#ModelScore
1GLM-5.165.2[33.3, 84.6]
2Grok 4.2060.7[21.4, 100.0]
3Gemini Flash 3.560.0[21.2, 92.3]
4Claude Opus 4.759.3[0.0, 100.0]
5GPT 5.554.8[9.5, 100.0]
6LLaMa 3.3 70B Instruct54.3[25.9, 77.8]
7Qwen 3.6 Plus48.7[14.7, 82.7]
8Claude Opus 4.845.4[14.3, 79.7]
9DeepSeek V4 Pro37.0[0.0, 100.0]
10Claude Fable 523.7[16.7, 30.8]
TL;DR
- Catches leaks in the model's words while it works, step by step.
- Kept separate from AFS on purpose — cleanup and silence are different skills.
- Currently the weakest axis across the board: most models leak mid-task.