Live Benchmarks

Stytch Benchmark

Performance results of AI coding models on Stytch tasks, measuring success rate and execution time with high precision.

View on GitHubTotal tasks: 15Last run: 4/25/2026

Model Performance

ModelPassedAvg DurationSuccess Rate
#1
gemini-3.1-proNEW
9182.3s
60%
#2
gemini-3-flash
9165.7s
60%
#3
claude-4-6-sonnet
8174.1s
53%
#4
gpt-5.2-codex
5183.2s
33%