Live Benchmarks

Stytch Benchmark

Performance results of AI coding models on Stytch tasks, measuring success rate and execution time with high precision.

Total tasks: 15

Last run: 4/25/2026

Model Performance

Model	Passed	Avg Duration	Success Rate
#1 gemini-3.1-proNEW	9	182.3s	60%
#2 gemini-3-flash	9	165.7s	60%
#3 claude-4-6-sonnet	8	174.1s	53%
#4 gpt-5.2-codex	5	183.2s	33%