Testing AI controlling AI with Hy3 Preview I barely lifted a finger the whole time.
One-click deployment of Hermes on WorkBuddy took some time with a few rounds of adjustments, and I finally got it up and running smoothly
Only minor issue was setting up Supermemory it was a bit slow on the uptake. I had to go over simple steps several times, guiding it patiently like teaching a kid.
The experience of AI orchestrating AI is absolutely incredible. started running Agents with Hunyuan right after its release, and it actually works perfectly.
295B parameters, 21B active parameters, with direct access to TokenHub now great cost-performance ratio too
Honestly, I used to get stuck on all kinds of environment configurations when deploying Agents locally. Using Hy3 to take command made the whole process way more streamlined.
โ 295B total / 21B active / 256K context โ Fused fast-and-slow thinking in a single model โ First model trained on Hunyuan's rebuilt pretraining + RL infra (Feb โ Apr)
Benchmarks: ๐ SWE-Bench Verified, Terminal-Bench 2.0, BrowseComp, WideSearch โ competitive results, particularly strong on agentic tool use ๐ Top score on Tsinghua's 2026 Spring math PhD qualifying exam ๐ Strong context-learning and instruction-following on Tencent's CL-bench / CL-bench-Life
โ 295B total / 21B active / 256K context โ Fused fast-and-slow thinking in a single model โ First model trained on Hunyuan's rebuilt pretraining + RL infra (Feb โ Apr)
Benchmarks: ๐ SWE-Bench Verified, Terminal-Bench 2.0, BrowseComp, WideSearch โ competitive results, particularly strong on agentic tool use ๐ Top score on Tsinghua's 2026 Spring math PhD qualifying exam ๐ Strong context-learning and instruction-following on Tencent's CL-bench / CL-bench-Life
A few numbers that stood out: โ 13h non-stop coding, 4,000+ lines โ 300 sub-agents, 4,000 steps โ 5-day autonomous runs (OpenClaw / Hermes) โ Parity with GPT-5.4 / Opus 4.6 / Gemini 3.1 Pro on SWE-Bench Pro + HLE
The 5-day autonomous number is the one I'd most like to see reproduced.
HY-World-2.0 โ A Multi-Modal World Model for Reconstructing, Generating, and Simulating 3D Worlds is now available on Spaces, and it works both as native Gradio components and in Gradio server mode.
Go-Code-Large is a large-scale corpus of Go (Golang) programming language source code, comprising 316,427 code samples stored in .jsonl format. The dataset is designed to support research and development in large language model (LLM) pretraining, static analysis, cloud-native systems, and modern backend software engineering.
By offering a focused and curated dataset for Go, this corpus enables experimentation in concurrent programming, distributed systems, and performance-oriented backend servicesโdomains where Go is widely adopted.
Go-Code-Large addresses the relative scarcity of large, language-specific datasets for Go, enabling targeted research into idiomatic Go patterns, concurrency primitives, and scalable system design.
Just tried tencent/HY-World-2.0 โ a multimodal world model that takes in text or a single image and generates editable 3D scenes.
Unlike Google's Genie and HY-World 1.5, v2.0 generates engine-ready 3D content: ๐ฎ Direct import into Unreal Engine and Unity โ no format wrangling ๐ง Supports multiple 3D asset formats: Mesh, 3DGS, point cloud, etc. โ๏ธ Fully editable โ not a baked video, but actual geometry you can modify ๐ค Also usable for embodied simulation environments
Basically: from "AI generates a world you can look at" โ "AI generates a world you can ship."
Just tried tencent/HY-World-2.0 โ a multimodal world model that takes in text or a single image and generates editable 3D scenes.
Unlike Google's Genie and HY-World 1.5, v2.0 generates engine-ready 3D content: ๐ฎ Direct import into Unreal Engine and Unity โ no format wrangling ๐ง Supports multiple 3D asset formats: Mesh, 3DGS, point cloud, etc. โ๏ธ Fully editable โ not a baked video, but actual geometry you can modify ๐ค Also usable for embodied simulation environments
Basically: from "AI generates a world you can look at" โ "AI generates a world you can ship."
๐ World Model Bench โ does your world model actually think?
FID measures realism. FVD measures smoothness. But neither tells you whether the model understood the scene.
We just released WM Bench โ the first benchmark for cognitive intelligence in world models. The core question: when a beast charges from 3 meters away, does the model know to sprint โ not walk? Does it respond differently to a human vs an animal? Does it remember the left corridor was blocked two steps ago?
Those are cognitive questions. No existing benchmark asks them. So we built one.
- ๐ P1 Perception (25%) โ Can it read the scene? - ๐ง P2 Cognition (45%) โ Does it predict threats, escalate emotions, utilize memory? - ๐ฅ P3 Embodiment (30%) โ Does the body respond with the right motion?
All evaluation is via simple JSON I/O โ no 3D engine, no special hardware. Any model with an API can participate.
We also built PROMETHEUS as a live reference implementation โ runs in your browser on a T4, no install needed. Combines FloodDiffusion motion generation with a LLM cognitive brain (Perceive โ Predict โ Decide โ Act). Scored 726/1000 (Grade B) on Track C โ the only directly verified model so far. Submissions from other teams very welcome.
Robonine just published a new article! Mechanical backlash is a common limitation in servo-driven robotic joints. In this experiment, paired Feetech STS3215 servos are used with a small opposing preload to eliminate gearbox play, significantly improving positional stability and motion precision in robotic manipulators.