GLM-5.1 SWE-Bench Pro Benchmark Results: What 58.4 Actually Means for Open-Weight AI
Z.AI's GLM-5.1 scored 58.4 on SWE-Bench Pro, edging GPT-5.4 and Claude Opus 4.6 by less than 1.1 points. The benchmark lead is real — the hardware requirement to run it locally is not consumer-grade.