I got tired of frustrating, error-filled troubleshooting sessions with AI chatbots, so I asked Copilot for help.
The second batch of “First Proof” problems is meant to evaluate AI’s usefulness for research-level math. The best model got six or seven of the ten questions right.