Instructions to use The-JDdev/GLM-5.2 with libraries, inference providers, notebooks, and local apps. Follow these links to get started.

Libraries

How to use The-JDdev/GLM-5.2 with Transformers:

# Use a pipeline as a high-level helper
from transformers import pipeline

pipe = pipeline("text-generation", model="The-JDdev/GLM-5.2")
messages = [
    {"role": "user", "content": "Who are you?"},
]
pipe(messages)

# Load model directly
from transformers import AutoTokenizer, AutoModelForMultimodalLM

tokenizer = AutoTokenizer.from_pretrained("The-JDdev/GLM-5.2")
model = AutoModelForMultimodalLM.from_pretrained("The-JDdev/GLM-5.2")
messages = [
    {"role": "user", "content": "Who are you?"},
]
inputs = tokenizer.apply_chat_template(
	messages,
	add_generation_prompt=True,
	tokenize=True,
	return_dict=True,
	return_tensors="pt",
).to(model.device)

outputs = model.generate(**inputs, max_new_tokens=40)
print(tokenizer.decode(outputs[0][inputs["input_ids"].shape[-1]:]))

Notebooks
Google Colab
Kaggle
Local Apps Settings

vLLM

How to use The-JDdev/GLM-5.2 with vLLM:

Install from pip and serve model

# Install vLLM from pip:
pip install vllm
# Start the vLLM server:
vllm serve "The-JDdev/GLM-5.2"
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:8000/v1/chat/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "The-JDdev/GLM-5.2",
		"messages": [
			{
				"role": "user",
				"content": "What is the capital of France?"
			}
		]
	}'

Use Docker

docker model run hf.co/The-JDdev/GLM-5.2

SGLang

How to use The-JDdev/GLM-5.2 with SGLang:

Install from pip and serve model

# Install SGLang from pip:
pip install sglang
# Start the SGLang server:
python3 -m sglang.launch_server \
    --model-path "The-JDdev/GLM-5.2" \
    --host 0.0.0.0 \
    --port 30000
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:30000/v1/chat/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "The-JDdev/GLM-5.2",
		"messages": [
			{
				"role": "user",
				"content": "What is the capital of France?"
			}
		]
	}'

Use Docker images

docker run --gpus all \
    --shm-size 32g \
    -p 30000:30000 \
    -v ~/.cache/huggingface:/root/.cache/huggingface \
    --env "HF_TOKEN=<secret>" \
    --ipc=host \
    lmsysorg/sglang:latest \
    python3 -m sglang.launch_server \
        --model-path "The-JDdev/GLM-5.2" \
        --host 0.0.0.0 \
        --port 30000
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:30000/v1/chat/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "The-JDdev/GLM-5.2",
		"messages": [
			{
				"role": "user",
				"content": "What is the capital of France?"
			}
		]
	}'

Docker Model Runner
How to use The-JDdev/GLM-5.2 with Docker Model Runner:
```
docker model run hf.co/The-JDdev/GLM-5.2
```

🚀 GLM-5.2: The Apex Predator of Large Language Models (The Ultimate Director's Cut)

"In a landscape constrained by limits, true power lies in infinite horizons."

👋 Join the Vanguard: Connect with top-tier engineers and AI researchers on our WeChat or the global Discord command center.
📖 The Lore & Blueprint: Decode the origins via the GLM-5.2 blog and dive deep into the math with our heavy-duty Technical Report.
📍 Unleash the Arsenal: Tap into raw, unadulterated computational firepower on the Z.ai API Platform.
🔜 Test the Beast: Experience the latency-free adrenaline rush directly in your browser here.

[Read the Original Paper] | [Clone the GitHub Repo]

🎬 PROLOGUE: The Awakening

For years, the artificial intelligence ecosystem has been plagued by a fundamental bottleneck: memory degradation. Models would claim to handle large documents, only to hallucinate, lose critical details in the middle of prompts, and completely collapse under the immense weight of repository-level coding tasks. The industry accepted this compromise. We refused to.

Enter GLM-5.2—not just an incremental update, but a complete cinematic overhaul of what open-source AI architecture can achieve. Engineered from the ground up to dominate long-horizon tasks, GLM-5.2 takes the proven foundational success of its predecessor, GLM-5.1, and injects it with a lethal dose of architectural optimization.

We are proudly delivering a flagship model that doesn't just "read" data; it consumes entire libraries, massive codebases, and infinite logs without dropping a single frame of context. For the very first time, we are guaranteeing elite-tier reasoning and generative superiority across a solid, unbroken 1,000,000-token context window.

🔥 ACT I: The Ultimate Arsenal (Core Capabilities & Deep Dive)

GLM-5.2 is packed with heavy-duty structural innovations that break the physical limits of traditional compute. Here is the exact blueprint of how we achieved this masterpiece:

🧠 1. The Infinite Canvas: Solid 1M-Token Context Engine

Most large language models suffer from the infamous "lost-in-the-middle" syndrome when pushed past 128k tokens. GLM-5.2 completely obliterates this limitation. What does 1 million tokens actually mean in the real world?

For Developers: You can load an entire enterprise-level GitHub repository—including the frontend architecture, backend logic, middleware systems, and database schemas—into the model's brain simultaneously. It understands how a change in utils.py affects your React frontend.
For Analysts: It can digest decades of financial ledgers, complete legal case histories, or multiple epic fantasy novels, recalling a specific data point from token #950,000 as clearly as token #10. No degrading. No sweating. Pure, raw memory retention.

💻 2. Adaptive Vibe Coding: The Power of Flexible Effort

Coding isn't just about syntax; it's high-stakes logical architecture. GLM-5.2 brings an entirely new paradigm to software engineering by introducing dynamic "Thinking Effort Levels."

Low Effort (Rapid Execution): Need a quick bash script or a simple Python function? Turn the dial down for lightning-fast, zero-latency output.
High Effort (Deep Architectural Reasoning): Building a complex, multi-tiered application? Crank the dial up. The model will pause, engage deep-reasoning protocols, simulate potential bugs, and architect flawless logic before it even starts writing the code. It’s the ultimate balance of performance and speed, directed entirely by your prompt.

⚙️ 3. IndexShare: The Architectural Plot Twist

To conquer the 1M context limit without requiring users to own a multi-million-dollar supercomputer cluster, our engineering team developed IndexShare. This is our proprietary secret weapon. In traditional sparse attention mechanisms, memory and compute scale exponentially. IndexShare shatters this by meticulously reusing the exact same indexer across every four sparse attention layers. The Result? A staggering 2.9× reduction in per-token FLOPs when operating at maximum context length. You get maximum output with minimal thermal throttling and memory consumption.

⚡ 4. Hyper-Speed Speculative Decoding (The MTP Layer)

We didn't just want GLM-5.2 to be smart; we needed it to be breathtakingly fast. By heavily upgrading the Multi-Token Prediction (MTP) layer, we turbocharged its speculative decoding mechanics. Instead of predicting one word at a time, GLM-5.2 anticipates and drafts entire sequences of thought simultaneously. This architectural steroid boosts the acceptance length by an astonishing 20%. It anticipates, it predicts, and it generates text at a speed that leaves competitors staring at loading screens.

🔓 5. Pure Open-Source: The MIT Liberation

We believe ultimate computing power belongs to the community. No paywalls, no hidden API taxes, no regional restrictions, and no bureaucratic red tape. GLM-5.2 is released entirely under the MIT open-source license. This is technical access without borders. Download it, tweak its weights, build commercial products on top of it—the system is entirely yours to command.

🥊 ACT II: The Gladiatorial Arena (Benchmark Dominance)

Talk is incredibly cheap in the AI industry. Everyone claims to be the state-of-the-art. So, we threw GLM-5.2 into the ring against the most feared proprietary and open-source models on the planet—including Qwen3.7-Max, MiniMax M3, DeepSeek-V4-Pro, Claude Opus 4.8, GPT-5.5, and Gemini 3.1 Pro.

It didn't just survive; it established absolute dominance.

🧩 The Logic & Reasoning Bloodbath

When it comes to pure, unfiltered intelligence and mathematical reasoning, GLM-5.2 operates in a league of its own:

AIME 2026: Scoring a near-perfect 99.2, GLM-5.2 proves it can handle elite, competition-level mathematics better than models twice its size.
GPQA-Diamond: With a score of 91.2, it shows graduate-level understanding in physics, biology, and chemistry.
HMMT (Nov 2025 & Feb 2026): Dominating with 94.4 and 92.5 respectively, leaving giants like MiniMax M3 in the dust.

🛠️ Software Engineering Supremacy

GLM-5.2 is a senior developer trapped inside a neural network:

SWE-bench Pro: Achieving 62.1, GLM-5.2 effortlessly resolves real-world, complex GitHub issues, beating out DeepSeek-V4-Pro and GPT-5.5.
Terminal Bench 2.1 (Best Harness): A massive 82.7 score proves its command-line execution and environment management are flawless.
FrontierSWE (Dominance): Crushing the competition with 74.4, proving it can navigate bleeding-edge, undocumented codebases.

🤖 Agentic Frameworks & Autonomy

For developers building autonomous AI agents, GLM-5.2 is the ultimate brain:

MCP-Atlas (Public Set): Scoring 76.8, it navigates complex multi-step tool use perfectly.
Tool-Decathlon: Outperforming closed-source models with a 48.2 score, showing it knows exactly when to call an API, how to parse the JSON, and what to do with the result.

📊 The Master Scorecard

Benchmark Realm	GLM-5.2	GLM-5.1	Qwen3.7-Max	MiniMax M3	DeepSeek-V4-Pro	Claude Opus 4.8	GPT-5.5	Gemini 3.1 Pro
🧠 Deep Reasoning
HLE	40.5	31.0	41.4	37.0	37.7	49.8*	41.4*	45.0
HLE (w/ Tools)	54.7	52.3	53.5	-	48.2	57.9*	52.2*	51.4*
CritPt	20.9	4.6	13.4	3.7	12.9	20.9	27.1	17.7
AIME 2026	99.2	95.3	97.0	-	94.6	95.7	98.3	98.2
HMMT Nov. 2025	94.4	94.0	95.0	84.4	94.4	96.5	96.5	94.8
HMMT Feb. 2026	92.5	82.6	97.1	84.4	95.2	96.7	96.7	87.3
IMOAnswerBench	91.0	83.8	90.0	-	89.8	83.5	-	81.0
GPQA-Diamond	91.2	86.2	90.0	93.0	90.1	93.6	93.6	94.3
💻 Code & Repositories
SWE-bench Pro	62.1	58.4	60.6	59.0	55.4	69.2	58.6	54.2
NL2Repo	48.9	42.7	47.2	42.1	35.5	69.7	50.7	33.4
DeepSWE	46.2	18.0	18.0	20.0	8.0	58.0	70.0	10.0
ProgramBench	63.7	50.9	-	-	47.8	71.9	70.8	39.5
Term-Bench 2.1	82.7	69.0	75.0	65.0	64.0	78.9	83.4	74.0
FrontierSWE	74.4	30.5	-	-	29.0	75.1	72.6	39.6
PostTrainBench	34.3	20.1	-	-	-	37.2	28.4	21.6
SWE-Marathon	13.0	1.0	-	-	-	26.0	12.0	4.0
🤖 Agentic Tools
MCP-Atlas (Public)	76.8	71.8	76.4	74.2	73.6	77.8	75.3	69.2
Tool-Decathlon	48.2	40.7	-	-	52.8	59.9	55.6	48.8

🖥️ ACT III: The Director's Chair (Deployment & Local Serving Matrix)

GLM-5.2 is designed to integrate seamlessly into your existing high-performance infrastructure. We don't believe in locking you into proprietary wrappers. Whether you are running locally on a beefy workstation, deploying on multi-GPU enterprise clusters, or utilizing specialized NPUs, we support the industry's heaviest frameworks. Load the payload using your preferred setup:

🔥 SGLang (Maximum Throughput Serving)

For production environments where request per second (RPS) is critical, SGLang provides unmatched throughput.

Version Required: v0.5.13.post1+
Documentation: Fire it up with this official SGLang Cookbook.

🔥 vLLM (The Industry Standard for High-Speed Inference)

If you want robust, reliable, and continuously updated inference, vLLM is fully supported right out of the box. PagedAttention works flawlessly with GLM-5.2.

Version Required: v0.23.0+
Documentation: Check out the action-packed vLLM Recipes.

🔥 Hugging Face Transformers (Native Integration)

For researchers and developers who want to stay within the familiar Hugging Face ecosystem, GLM-5.2 natively plugs into the transformers library.

Version Required: v0.5.12+
Documentation: Master the setup via the Transformers docs.

🔥 KTransformers (Custom Kernel Execution)

Need absolute granular control over how the GPU executes your operations? KTransformers lets you squeeze every drop of performance out of your hardware.

Version Required: v0.5.12+
Documentation: Follow this brutal step-by-step tutorial.

🔥 Unsloth (Rapid Fine-Tuning)

For developers who want to fine-tune the beast on their own datasets in record time without burning their GPUs. Unsloth makes training GLM-5.2 twice as fast with half the memory.

Version Required: v0.1.47-beta+
Documentation: Tame it using this quick-start guide.

🚀 Ascend NPU (Enterprise Hardware Support)

We haven't forgotten specialized enterprise nodes. GLM-5.2 has full hardware backing for Huawei Ascend environments using frameworks like vLLM-Ascend, xLLM, and SGLang.

Documentation: Gear up with the Ascend deployment matrix here.

📜 ACT IV: The Credits (Citation & Recognition)

Open-source survives on community respect and academic integrity. If GLM-5.2 helps you architect your next massive breakthrough, write the code that changes the world, or launch the ultimate autonomous AI agent, give respect where it is due.

Please cite our exhaustive technical report in your research papers, GitHub repositories, and production documentation:

@misc{glm5team2026glm5vibecodingagentic,
      title={GLM-5: from Vibe Coding to Agentic Engineering},
      author={GLM-5-Team and : and Aohan Zeng and Xin Lv and Zhenyu Hou and Zhengxiao Du and Qinkai Zheng and Bin Chen and Da Yin and Chendi Ge and Chenghua Huang and Chengxing Xie and Chenzheng Zhu and Congfeng Yin and Cunxiang Wang and Gengzheng Pan and Hao Zeng and Haoke Zhang and Haoran Wang and Huilong Chen and Jiajie Zhang and Jian Jiao and Jiaqi Guo and Jingsen Wang and Jingzhao Du and Jinzhu Wu and Kedong Wang and Lei Li and Lin Fan and Lucen Zhong and Mingdao Liu and Mingming Zhao and Pengfan Du and Qian Dong and Rui Lu and Shuang-Li and Shulin Cao and Song Liu and Ting Jiang and Xiaodong Chen and Xiaohan Zhang and Xuancheng Huang and Xuezhen Dong and Yabo Xu and Yao Wei and Yifan An and Yilin Niu and Yitong Zhu and Yuanhao Wen and Yukuo Cen and Yushi Bai and Zhongpei Qiao and Zihan Wang and Zikang Wang and Zilin Zhu and Ziqiang Liu and Zixuan Li and Bojie Wang and Bosi Wen and Can Huang and Changpeng Cai and Chao Yu and Chen Li and Chengwei Hu and Chenhui Zhang and Dan Zhang and Daoyan Lin and Dayong Yang and Di Wang and Ding Ai and Erle Zhu and Fangzhou Yi and Feiyu Chen and Guohong Wen and Hailong Sun and Haisha Zhao and Haiyi Hu and Hanchen Zhang and Hanrui Liu and Hanyu Zhang and Hao Peng and Hao Tai and Haobo Zhang and He Liu and Hongwei Wang and Hongxi Yan and Hongyu Ge and Huan Liu and Huanpeng Chu and Jia'ni Zhao and Jiachen Wang and Jiajing Zhao and Jiamin Ren and Jiapeng Wang and Jiaxin Zhang and Jiayi Gui and Jiayue Zhao and Jijie Li and Jing An and Jing Li and Jingwei Yuan and Jinhua Du and Jinxin Liu and Junkai Zhi and Junwen Duan and Kaiyue Zhou and Kangjian Wei and Ke Wang and Keyun Luo and Laiqiang Zhang and Leigang Sha and Liang Xu and Lindong Wu and Lintao Ding and Lu Chen and Minghao Li and Nianyi Lin and Pan Ta and Qiang Zou and Rongjun Song and Ruiqi Yang and Shangqing Tu and Shangtong Yang and Shaoxiang Wu and Shengyan Zhang and Shijie Li and Shuang Li and Shuyi Fan and Wei Qin and Wei Tian and Weining Zhang and Wenbo Yu and Wenjie Liang and Xiang Kuang and Xiangmeng Cheng and Xiangyang Li and Xiaoquan Yan and Xiaowei Hu and Xiaoying Ling and Xing Fan and Xingye Xia and Xinyuan Zhang and Xinze Zhang and Xirui Pan and Xu Zou and Xunkai Zhang and Yadi Liu and Yandong Wu and Yanfu Li and Yidong Wang and Yifan Zhu and Yijun Tan and Yilin Zhou and Yiming Pan and Ying Zhang and Yinpei Su and Yipeng Geng and Yong Yan and Yonglin Tan and Yuean Bi and Yuhan Shen and Yuhao Yang and Yujiang Li and Yunan Liu and Yunqing Wang and Yuntao Li and Yurong Wu and Yutao Zhang and Yuxi Duan and Yuxuan Zhang and Zezhen Liu and Zhengtao Jiang and Zhenhe Yan and Zheyu Zhang and Zhixiang Wei and Zhuo Chen and Zhuoer Feng and Zijun Yao and Ziwei Chai and Ziyuan Wang and Zuzhou Zhang and Bin Xu and Minlie Huang and Hongning Wang and Juanzi Li and Yuxiao Dong and Jie Tang},
      year={2026},
      eprint={2602.15763},
      archivePrefix={arXiv},
      primaryClass={cs.LG},
      url={[https://arxiv.org/abs/2602.15763](https://arxiv.org/abs/2602.15763)},
}

Downloads last month: -

Safetensors

Model size

753B params

Tensor type

BF16

F32

Paper for The-JDdev/GLM-5.2

GLM-5: from Vibe Coding to Agentic Engineering

Paper • 2602.15763 • Published Feb 17 • 170