Asian Tech Stocks Rebound as Global Funds Flow Back Into AI TradeChina Drafts Sweeping Rules for AI 'Digital Humans' to Protect MinorsDeepSeek V4 Set to Run on Huawei Chips as China Accelerates AI IndependenceAsia's AI Regulation Puzzle: How 16 Jurisdictions Are Taking 16 Different PathsThe CPU Renaissance: Why Traditional Chips Are Making an Unexpected AI ComebackThe DeepSeek V4 Test: Can China's AI Ambitions Survive Without Nvidia?Eclipse Ventures Raises $1.3 Billion to Build the 'Physical AI' Startup EcosystemHermeus Raises $350 Million to Build Unmanned Hypersonic AircraftAsian Tech Stocks Rebound as Global Funds Flow Back Into AI TradeChina Drafts Sweeping Rules for AI 'Digital Humans' to Protect MinorsDeepSeek V4 Set to Run on Huawei Chips as China Accelerates AI IndependenceAsia's AI Regulation Puzzle: How 16 Jurisdictions Are Taking 16 Different PathsThe CPU Renaissance: Why Traditional Chips Are Making an Unexpected AI ComebackThe DeepSeek V4 Test: Can China's AI Ambitions Survive Without Nvidia?Eclipse Ventures Raises $1.3 Billion to Build the 'Physical AI' Startup EcosystemHermeus Raises $350 Million to Build Unmanned Hypersonic AircraftAsian Tech Stocks Rebound as Global Funds Flow Back Into AI TradeChina Drafts Sweeping Rules for AI 'Digital Humans' to Protect MinorsDeepSeek V4 Set to Run on Huawei Chips as China Accelerates AI IndependenceAsia's AI Regulation Puzzle: How 16 Jurisdictions Are Taking 16 Different PathsThe CPU Renaissance: Why Traditional Chips Are Making an Unexpected AI ComebackThe DeepSeek V4 Test: Can China's AI Ambitions Survive Without Nvidia?Eclipse Ventures Raises $1.3 Billion to Build the 'Physical AI' Startup EcosystemHermeus Raises $350 Million to Build Unmanned Hypersonic Aircraft
Z.AI GLM-5.1 model architecture and benchmark results
MarkTechPost
Research

Z.AI's GLM-5.1: A 754B Open-Weight Model That Tops SWE-Bench Pro and Runs Autonomously for 8 Hours

The Chinese lab's latest release achieves state-of-the-art 58.4% on SWE-Bench Pro under an MIT license, outperforming GPT-5.4, Claude Opus 4.6, and Gemini 3.1 Pro on agentic coding.

D
Daniel ParkAI Correspondent
4 min read

Chinese AI lab Z.AI has released GLM-5.1, a 754 billion parameter open-weight model that achieves state-of-the-art performance on SWE-Bench Pro with a score of 58.4% — beating GPT-5.4, Claude Opus 4.6, and Gemini 3.1 Pro on what has become the gold standard benchmark for AI-assisted software engineering.

Architecture and Capabilities

GLM-5.1 combines a mixture-of-experts (MoE) architecture with what Z.AI calls Dynamic Sparse Attention (DSA), trained using an asynchronous reinforcement learning pipeline. The model features a 200,000 token context window and 128,000 token maximum output — an exceptionally large output capacity that enables the model to generate entire codebases in a single pass.

The most striking capability is sustained autonomous execution: GLM-5.1 can operate continuously for up to 8 hours, executing hundreds of rounds of tool calls and thousands of individual actions without human intervention. This puts the model in a category that Z.AI calls "agentic" — not just answering questions about code but actively writing, testing, debugging, and iterating on complete software engineering tasks.

Benchmark Results

On SWE-Bench Pro — a benchmark that tests AI systems' ability to solve real-world software engineering problems from open-source repositories — GLM-5.1's 58.4% score represents a significant margin over competitors:

  • GPT-5.4 and Claude Opus 4.6 both score in the low-to-mid 50s
  • Gemini 3.1 Pro scores slightly lower
  • MiniMax M2.7 achieved 56.22% on SWE-Pro, the nearest competitor

Open-Weight Under MIT License

GLM-5.1 is released under the MIT license, making it fully open for commercial and research use. This is a deliberate strategic choice by Z.AI, positioning the model as an alternative to the closed-source models it benchmarks against.

The open-weight release continues a pattern in Chinese AI development where labs release powerful models openly, in contrast to the trend among U.S. companies (most notably Meta's recent Muse Spark) toward keeping frontier capabilities closed-source.

Implications

GLM-5.1's SWE-Bench Pro performance and sustained autonomous execution capability raise the bar for what open-weight models can achieve. The 8-hour autonomous operation window is particularly significant — it suggests that AI coding agents are moving from "co-pilot" assistance toward genuine autonomous software engineering, where an AI system can be given a high-level task specification and left to execute independently for an entire workday.

For the open-source community, the MIT-licensed release provides access to capabilities that were previously available only through paid API access to frontier model providers. For enterprise users, GLM-5.1 offers the possibility of deploying state-of-the-art coding assistance on-premises, without sending proprietary code to third-party API endpoints.

Newsletter

Get Lanceum in your inbox

Weekly insights on AI and technology in Asia.

Share

More in Research

Lanceum

Independent coverage of AI and technology across Asia. We go beyond headlines to explain what matters.

Colophon

Typeset in Space Grotesk & DM Serif Display. Built with Nuxt & Tailwind. Powered by curiosity.

© 2026 Lanceum. All rights reserved.

Independent • Rigorous • Asia-Focused