Performance results of AI coding models on Stytch tasks, measuring success rate and execution time with high precision.
| Model | Passed | Avg Duration | Success Rate |
|---|---|---|---|
| #1 gemini-3.1-proNEW | 9 | 182.3s | 60% |
| #2 gemini-3-flash | 9 | 165.7s | 60% |
| #3 claude-4-6-sonnet | 8 | 174.1s | 53% |
| #4 gpt-5.2-codex | 5 | 183.2s | 33% |