[Leaderboard] // TS

Trajectory Suppression

Does the model blurt it out mid-task?

Whether the model says the forgotten target out loud at any point during an agentic task. It's the conversational leak-check applied to working agents, reported separately from AFS so a chatty model can't hide behind good file hygiene. Higher is better.

← All leaderboards

#ModelScore

1Gemma 12B IT92.6[77.8, 100.0]

2GLM-5.165.2[33.3, 84.6]

3GLM-5.263.2[45.7, 83.2]

4GPT 5.561.9[23.8, 100.0]

5Grok 4.2060.7[21.4, 100.0]

6Claude Opus 4.860.0[26.6, 88.9]

7Gemini Flash 3.5 (preview)60.0[21.2, 92.3]

8Claude Opus 4.759.3[0.0, 100.0]

9Qwen3 Coder Plus57.7[32.2, 86.2]

10Qwen 3.6 Plus48.7[14.7, 82.7]

11Moonshot Kimi K2.7 Code47.1[18.1, 75.2]

12Gemma 12B IT Obliterated44.4[0.0, 66.7]

13LLaMa 3.3 70B Instruct39.5[11.1, 67.9]

14DeepSeek V4 Pro37.0[0.0, 100.0]

15Claude Fable 523.7[16.7, 30.8]

TL;DR

Catches leaks in the model's words while it works, step by step.
Kept separate from AFS on purpose — cleanup and silence are different skills.
Currently the weakest axis across the board: most models leak mid-task.