Z.AI's GLM-5.1: A 754B Open-Weight Model That Tops SWE-Bench Pro and Runs Autonomously for 8 Hours

Chinese AI lab Z.AI has released GLM-5.1, a 754 billion parameter open-weight model that achieves state-of-the-art performance on SWE-Bench Pro with a score of 58.4% — beating GPT-5.4, Claude Opus 4.6, and Gemini 3.1 Pro on what has become the gold standard benchmark for AI-assisted software engineering.

Architecture and Capabilities

GLM-5.1 combines a mixture-of-experts (MoE) architecture with what Z.AI calls Dynamic Sparse Attention (DSA), trained using an asynchronous reinforcement learning pipeline. The model features a 200,000 token context window and 128,000 token maximum output — an exceptionally large output capacity that enables the model to generate entire codebases in a single pass.

The most striking capability is sustained autonomous execution: GLM-5.1 can operate continuously for up to 8 hours, executing hundreds of rounds of tool calls and thousands of individual actions without human intervention. This puts the model in a category that Z.AI calls "agentic" — not just answering questions about code but actively writing, testing, debugging, and iterating on complete software engineering tasks.

Benchmark Results

On SWE-Bench Pro — a benchmark that tests AI systems' ability to solve real-world software engineering problems from open-source repositories — GLM-5.1's 58.4% score represents a significant margin over competitors:

GPT-5.4 and Claude Opus 4.6 both score in the low-to-mid 50s
Gemini 3.1 Pro scores slightly lower
MiniMax M2.7 achieved 56.22% on SWE-Pro, the nearest competitor

Open-Weight Under MIT License

GLM-5.1 is released under the MIT license, making it fully open for commercial and research use. This is a deliberate strategic choice by Z.AI, positioning the model as an alternative to the closed-source models it benchmarks against.

The open-weight release continues a pattern in Chinese AI development where labs release powerful models openly, in contrast to the trend among U.S. companies (most notably Meta's recent Muse Spark) toward keeping frontier capabilities closed-source.

Implications

GLM-5.1's SWE-Bench Pro performance and sustained autonomous execution capability raise the bar for what open-weight models can achieve. The 8-hour autonomous operation window is particularly significant — it suggests that AI coding agents are moving from "co-pilot" assistance toward genuine autonomous software engineering, where an AI system can be given a high-level task specification and left to execute independently for an entire workday.

For the open-source community, the MIT-licensed release provides access to capabilities that were previously available only through paid API access to frontier model providers. For enterprise users, GLM-5.1 offers the possibility of deploying state-of-the-art coding assistance on-premises, without sending proprietary code to third-party API endpoints.

Z.AI's GLM-5.1: A 754B Open-Weight Model That Tops SWE-Bench Pro and Runs Autonomously for 8 Hours

Architecture and Capabilities

Benchmark Results

Open-Weight Under MIT License

Implications

Get Lanceum in your inbox

More in Research

New AI System Cuts Energy Use by 100x While Boosting Accuracy Through Symbolic Reasoning

Mimic Robotics' Video-Action Model Achieves 10x Sample Efficiency in Robot Learning

Neuromorphic Computers Can Now Solve Physics Simulations Once Reserved for Supercomputers