โ—† Operator

Model leaderboard

Which Ollama model behaves most like fable โ€” ranked by success, then verify-pass, then fewer steps. Ground truth is each eval task's verify exit code.

#1 qwen3-coder-next
100% success
100%
verify
3.0
steps
0%
tool err
1
runs
$0.0000
$/run
4.2s
latency
#2 ministral-3:3b
0% success
0%
verify
3.0
steps
67%
tool err
1
runs
$0.0000
$/run
5.1s
latency