I am thrilled to announce the launch of version 2 of the ๐๐ฅ๐๐ฃ ๐ ๐๐ฅ๐๐ฃ๐๐จ๐ ๐๐๐ ๐๐๐๐๐๐ง๐๐ค๐๐ง๐. This initiative is driven by the "Fine-tuning and Evaluation" team, led by Professor Miyao at the The University of Tokyo, under the Research and Development Center for Large Language Models (LLMC) at Japanโs National Institute of Informatics (NII).
๐๐ฉ๐ง๐๐ฉ๐๐๐๐ ๐๐ฃ๐ ๐ฉ๐๐๐๐ฃ๐๐๐๐ก ๐ช๐ฅ๐๐ง๐๐๐๐จ: - Our new backend features eight A100 GPUs, enabling the evaluation of open-source models of more than 100B parameters. - Submissions now require a Hugging Face Hub login to ensure accountability. - We have added metrics for evaluation time, COโ emissions (thx to Code Carbon ๐ฑ ), alongside reasoning capabilities.
๐ฟ๐๐ฉ๐๐จ๐๐ฉ๐จ ๐๐ฃ๐ ๐๐ซ๐๐ก๐ช๐๐ฉ๐๐ค๐ฃ ๐จ๐ฉ๐๐ฃ๐๐๐ง๐๐จ: - New datasets cover reasoning, mathematics, exams, and instruction following. - Math evaluations now span from grade-school levels to expert-tier challenges (GSM8K, PolyMath, AIME). - While integrating English-heavy and multilingual benchmarks (including Humanityโs Last Exam, GPQA, and BBH in both English and Japanese), we continue to prioritize unique Japanese cultural datasets.
โ 295B total / 21B active / 256K context โ Fused fast-and-slow thinking in a single model โ First model trained on Hunyuan's rebuilt pretraining + RL infra (Feb โ Apr)
Benchmarks: ๐ SWE-Bench Verified, Terminal-Bench 2.0, BrowseComp, WideSearch โ competitive results, particularly strong on agentic tool use ๐ Top score on Tsinghua's 2026 Spring math PhD qualifying exam ๐ Strong context-learning and instruction-following on Tencent's CL-bench / CL-bench-Life
Real-time 3D telemetry of ethicalabs/Echo-DSRN-114M-v0.1.2 processing "can you please order a pizza" through the Intent classifier LoRA adapter.
Each orange lattice is a DSRNBlock slow state manifold. The red sphere is live entropy. The right panel shows the surprise gate firing token by token as the model converges on [TAKEAWAY_ORDER].
I built this because I'm a visual learner and I wanted to see the surprise gate open and close on each token. I needed to see what was happening inside the network, not just trust that it was working.
Turns out it's also a decent way to explain the architecture to someone who's never heard of this.