
Z.AI's GLM-5.1: A 754B Open-Weight Model That Tops SWE-Bench Pro and Runs Autonomously for 8 Hours
The Chinese lab's latest release achieves state-of-the-art 58.4% on SWE-Bench Pro under an MIT license, outperforming GPT-5.4, Claude Opus 4.6, and Gemini 3.1 Pro on agentic coding.
Chinese AI lab Z.AI has released GLM-5.1, a 754 billion parameter open-weight model that achieves state-of-the-art performance on SWE-Bench Pro with a score of 58.4% — beating GPT-5.4, Claude Opus 4.6, and Gemini 3.1 Pro on what has become the gold standard benchmark for AI-assisted software engineering.
Architecture and Capabilities
GLM-5.1 combines a mixture-of-experts (MoE) architecture with what Z.AI calls Dynamic Sparse Attention (DSA), trained using an asynchronous reinforcement learning pipeline. The model features a 200,000 token context window and 128,000 token maximum output — an exceptionally large output capacity that enables the model to generate entire codebases in a single pass.
The most striking capability is sustained autonomous execution: GLM-5.1 can operate continuously for up to 8 hours, executing hundreds of rounds of tool calls and thousands of individual actions without human intervention. This puts the model in a category that Z.AI calls "agentic" — not just answering questions about code but actively writing, testing, debugging, and iterating on complete software engineering tasks.
Benchmark Results
On SWE-Bench Pro — a benchmark that tests AI systems' ability to solve real-world software engineering problems from open-source repositories — GLM-5.1's 58.4% score represents a significant margin over competitors:
- GPT-5.4 and Claude Opus 4.6 both score in the low-to-mid 50s
- Gemini 3.1 Pro scores slightly lower
- MiniMax M2.7 achieved 56.22% on SWE-Pro, the nearest competitor
Open-Weight Under MIT License
GLM-5.1 is released under the MIT license, making it fully open for commercial and research use. This is a deliberate strategic choice by Z.AI, positioning the model as an alternative to the closed-source models it benchmarks against.
The open-weight release continues a pattern in Chinese AI development where labs release powerful models openly, in contrast to the trend among U.S. companies (most notably Meta's recent Muse Spark) toward keeping frontier capabilities closed-source.
Implications
GLM-5.1's SWE-Bench Pro performance and sustained autonomous execution capability raise the bar for what open-weight models can achieve. The 8-hour autonomous operation window is particularly significant — it suggests that AI coding agents are moving from "co-pilot" assistance toward genuine autonomous software engineering, where an AI system can be given a high-level task specification and left to execute independently for an entire workday.
For the open-source community, the MIT-licensed release provides access to capabilities that were previously available only through paid API access to frontier model providers. For enterprise users, GLM-5.1 offers the possibility of deploying state-of-the-art coding assistance on-premises, without sending proprietary code to third-party API endpoints.
Newsletter
Get Lanceum in your inbox
Weekly insights on AI and technology in Asia.


