Instructions to use MultiverseComputingCAI/Hypernova-60B-2605 with libraries, inference providers, notebooks, and local apps. Follow these links to get started.
- Libraries
- Transformers
How to use MultiverseComputingCAI/Hypernova-60B-2605 with Transformers:
# Use a pipeline as a high-level helper from transformers import pipeline pipe = pipeline("text-generation", model="MultiverseComputingCAI/Hypernova-60B-2605") messages = [ {"role": "user", "content": "Who are you?"}, ] pipe(messages)# Load model directly from transformers import AutoTokenizer, AutoModelForCausalLM tokenizer = AutoTokenizer.from_pretrained("MultiverseComputingCAI/Hypernova-60B-2605") model = AutoModelForCausalLM.from_pretrained("MultiverseComputingCAI/Hypernova-60B-2605") messages = [ {"role": "user", "content": "Who are you?"}, ] inputs = tokenizer.apply_chat_template( messages, add_generation_prompt=True, tokenize=True, return_dict=True, return_tensors="pt", ).to(model.device) outputs = model.generate(**inputs, max_new_tokens=40) print(tokenizer.decode(outputs[0][inputs["input_ids"].shape[-1]:])) - Notebooks
- Google Colab
- Kaggle
- Local Apps
- vLLM
How to use MultiverseComputingCAI/Hypernova-60B-2605 with vLLM:
Install from pip and serve model
# Install vLLM from pip: pip install vllm # Start the vLLM server: vllm serve "MultiverseComputingCAI/Hypernova-60B-2605" # Call the server using curl (OpenAI-compatible API): curl -X POST "http://localhost:8000/v1/chat/completions" \ -H "Content-Type: application/json" \ --data '{ "model": "MultiverseComputingCAI/Hypernova-60B-2605", "messages": [ { "role": "user", "content": "What is the capital of France?" } ] }'Use Docker
docker model run hf.co/MultiverseComputingCAI/Hypernova-60B-2605
- SGLang
How to use MultiverseComputingCAI/Hypernova-60B-2605 with SGLang:
Install from pip and serve model
# Install SGLang from pip: pip install sglang # Start the SGLang server: python3 -m sglang.launch_server \ --model-path "MultiverseComputingCAI/Hypernova-60B-2605" \ --host 0.0.0.0 \ --port 30000 # Call the server using curl (OpenAI-compatible API): curl -X POST "http://localhost:30000/v1/chat/completions" \ -H "Content-Type: application/json" \ --data '{ "model": "MultiverseComputingCAI/Hypernova-60B-2605", "messages": [ { "role": "user", "content": "What is the capital of France?" } ] }'Use Docker images
docker run --gpus all \ --shm-size 32g \ -p 30000:30000 \ -v ~/.cache/huggingface:/root/.cache/huggingface \ --env "HF_TOKEN=<secret>" \ --ipc=host \ lmsysorg/sglang:latest \ python3 -m sglang.launch_server \ --model-path "MultiverseComputingCAI/Hypernova-60B-2605" \ --host 0.0.0.0 \ --port 30000 # Call the server using curl (OpenAI-compatible API): curl -X POST "http://localhost:30000/v1/chat/completions" \ -H "Content-Type: application/json" \ --data '{ "model": "MultiverseComputingCAI/Hypernova-60B-2605", "messages": [ { "role": "user", "content": "What is the capital of France?" } ] }' - Docker Model Runner
How to use MultiverseComputingCAI/Hypernova-60B-2605 with Docker Model Runner:
docker model run hf.co/MultiverseComputingCAI/Hypernova-60B-2605
| base_model: | |
| - openai/gpt-oss-120b | |
| - MultiverseComputingCAI/HyperNova-60B | |
| library_name: transformers | |
| license: apache-2.0 | |
| <div align="center"> | |
| # HyperNova 60B 2605 | |
| ### Powered by CompactifAI | |
| [](https://opensource.org/licenses/Apache-2.0) | |
| [](https://huggingface.co/MultiverseComputingCAI/HyperNova-60B-2605) | |
| [](https://discord.gg/cGas9uStqp) | |
| **Optimized for Efficient Inference** · **Reduced Memory Footprint** · **Native Tool Calling Support** | |
| </div> | |
| --- | |
| ## Table of Contents | |
| - [Highlights](#highlights) | |
| - [Model Overview](#model-overview) | |
| - [Key Characteristics](#key-characteristics) | |
| - [Quick Start](#quick-start) | |
| - [What's New in HyperNova 60B 2605](#whats-new-in-hypernova-60b-2605) | |
| - [Tool Calling](#tool-calling) | |
| - [Training & Fine-Tuning](#training--fine-tuning) | |
| - [Architecture](#architecture) | |
| - [Evaluation & Benchmarks](#evaluation--benchmarks) | |
| - [Languages](#languages) | |
| - [Intended Use](#intended-use) | |
| - [Safety & Limitations](#safety--limitations) | |
| - [Model Information](#model-information) | |
| - [Citation](#citation) | |
| --- | |
| ## Model Overview | |
| **HyperNova 60B 2605**, developed by **Multiverse Computing**, is an open-weight model designed for powerful **general** reasoning, **coding**, and versatile developer use. | |
| The model is **instruction-tuned** and supports **native tool calling** (function calling with defined schemas, structured outputs, and agent-style workflows). HyperNova 60B 2605 is intended for code generation, RAG, and tool-augmented applications. | |
| ## Technical Deep Dive | |
| For a detailed explanation of the compression architecture, model compression process, and benchmark results behind Hypernova-60B, read [this full technical article by Johanna Angulo, Evaluation Manager at Multiverse Computing.](https://multiversecomputing.com/papers/hypernova-60b-2602-same-intelligence-half-the-size-improved-tool-calling-capability) | |
| --- | |
| ## Key Characteristics | |
| | Characteristic | Description | | |
| |-----------------------|-------------| | |
| | 🛠️ **Tool calling** | Native support; OpenAI-style function / tool calling schemas; suited to coding agents and structured outputs | | |
| | 🧠 **Parameters** | 60B total parameters | | |
| | 📐 **Architecture** | Decoder-only Transformer | | |
| | Primary language | English | | |
| | Other languages | Not formally evaluated | | |
| --- | |
| ## Quick Start | |
| This model can be loaded with the **Transformers** API. Use `trust_remote_code=True` (required for the gpt-oss architecture). Recommended approach: `AutoModelForCausalLM` with `apply_chat_template`: | |
| ```python | |
| import torch | |
| from transformers import AutoModelForCausalLM, AutoTokenizer | |
| model_id = "MultiverseComputingCAI/HyperNova-60B-2605" | |
| tokenizer = AutoTokenizer.from_pretrained(model_id, trust_remote_code=True) | |
| model = AutoModelForCausalLM.from_pretrained( | |
| model_id, | |
| device_map="auto", | |
| torch_dtype="auto", | |
| trust_remote_code=True, | |
| ) | |
| messages = [{"role": "user", "content": "What is a Hypernova?"}] | |
| inputs = tokenizer.apply_chat_template( | |
| messages, | |
| return_tensors="pt", | |
| add_generation_prompt=True, | |
| ) | |
| inputs = inputs.to(model.device) | |
| attention_mask = torch.ones_like(inputs, dtype=torch.long, device=inputs.device) | |
| outputs = model.generate( | |
| inputs, | |
| max_new_tokens=512, | |
| do_sample=True, | |
| temperature=0.7, | |
| attention_mask=attention_mask, | |
| ) | |
| reply = tokenizer.decode(outputs[0][inputs.shape[1]:], skip_special_tokens=True) | |
| print(reply) | |
| ``` | |
| Alternatively you can use the `pipeline` API with `trust_remote_code=True`; the pipeline returns the full conversation structure, so extract the assistant message from `outputs[0]["generated_text"]` as needed. | |
| --- | |
| ## What’s New in HyperNova 60B 2605 | |
| **HyperNova 60B 2605** is an improved version of **HyperNova 60B 2602**, with this release focused on **coding** and **general** capability backed by higher scores on several benchmarks. | |
| ### Summary | |
| - **Improvement focus vs HyperNova 60B 2602:** stronger **coding** (coding-style tasks) and **general** benchmark performance. | |
| - **Tool use:** Retains native support for function calling, structured outputs, and agent-style workflows (OpenAI-style schemas). | |
| - **Reasoning:** Compatible with configurable reasoning effort (e.g. low / medium / high in system prompt) where the format is preserved; full chain-of-thought available for debugging and analysis. | |
| - **Evaluated** on coding and tool-heavy benchmarks (e.g. Tau2-bench, Terminal-Bench) alongside **general** intelligence benchmarks. | |
| --- | |
| ## Tool Calling | |
| HyperNova 60B 2605 supports **native tool use** and is well-suited for: | |
| - **Function calling** with defined schemas | |
| - **Structured outputs** | |
| - **Coding-oriented tool workflows** (e.g. browser tasks, code execution where supported) | |
| The model can detect when to invoke tools, emit structured JSON tool calls, and consume tool outputs to continue generation. Tool-calling behavior follows **OpenAI-style schemas**; compatibility refers to format and structure—exact parity with the base or other models is not guaranteed. | |
| Compared with HyperNova 60B 2602, this release improves on **coding** and **general** evaluation tracks—including IFBench, Tau2-bench, Terminal Bench, and AA-LCR under the high-reasoning setup reported below. | |
| ### Example Tool Call | |
| ```json | |
| { | |
| "name": "get_weather", | |
| "arguments": { | |
| "city": "Paris", | |
| "date": "2026-02-10" | |
| } | |
| } | |
| ``` | |
| --- | |
| ## Architecture | |
| ### Model Specifications | |
| | Specification | Value | | |
| |-------------------|--------------------| | |
| | Total parameters | 60B, 4.8B active MoE | | |
| --- | |
| ## Evaluation & Benchmarks | |
| ### Evaluation Methodology | |
| Benchmark scores were obtained with the following setups. Methodology varies by benchmark family. | |
| #### HLE, MMLU-Pro, AIME25, GPQA:d, LiveCodeBench | |
| - **Evaluation framework**: [Nemo-skills](https://github.com/NVIDIA/NeMo-Skills) | |
| - **Inference library**: vLLM 0.13.0 | |
| - **Hardware**: 1× NVIDIA H200 Tensor Core GPU | |
| - **Reasoning effort**: high | |
| - **Decoding**: temperature = 1.0, top_p = 1.0 | |
| - **Batch size**: 64 | |
| #### IFBench, AA-LCR, SciCode | |
| - **Evaluation framework**: [Nemo-skills](https://github.com/NVIDIA/NeMo-Skills) | |
| - **Inference library**: vLLM 0.13.0 | |
| - **Hardware**: 1× NVIDIA H200 Tensor Core GPU | |
| - **Reasoning effort**: high | |
| - **Decoding**: temperature = 1.0,top_p = 1.0 | |
| - **Batch size**: 64 | |
| #### Tau2-bench (Telecom) | |
| - **Evaluation framework**: [EvalScope](https://github.com/EvalScope/EvalScope) 1.4.1 | |
| - **Inference library**: vLLM 0.13.0 | |
| - **Hardware**: 1× NVIDIA H200 Tensor Core GPU | |
| - **Reasoning effort**: high (agent `extra_body.reasoning_effort`) | |
| - **Decoding (agent)**: temperature = 1.0, top_p = 1.0, min_tokens = 1 | |
| - **Decoding (judge / user simulator)**: temperature = 0.7, timeout = 600 | |
| - **Reproducibility**: subset telecom (default); max steps 100; repeats 3; tool-call parser openai (agent), hermes (judge) | |
| #### Terminal-Bench Hard (Artificial Analysis subset): | |
| - **Evaluation framework**: laude-institute/harbor == 0.1.43 | |
| - **Inference library**: vLLM == 0.13.0 | |
| - **Hardware**: 1× NVIDIA H200 Tensor Core GPU | |
| - **Reasoning effort**: high | |
| - **Decoding**: temperature = 1.0, top_p = 1.0, max-model-len = 131072 | |
| - **Reproducibility**: subset from AA (https://artificialanalysis.ai/methodology/intelligence-benchmarking#terminal-bench-hard) | |
| - **Agent**: terminus-2, max episodes 100; repeats 3; | |
| #### Aider polyglot | |
| - **Evaluation framework**: [Aider-AI/aider](https://github.com/Aider-AI/aider) | |
| - **Hardware**: 2× NVIDIA H200 Tensor Core GPU (host with Docker) | |
| - **Dataset**: `polyglot-benchmark` (225 exercises across multiple languages) | |
| - **Reasoning effort**: high (passed via `--reasoning-effort`) | |
| - **Decoding**: temperature = 1.0, top_p = 1.0 (configurable via `generation_config` / `--read-model-settings` YAML) | |
| - **Edit format**: `whole` (also supports `diff | udiff | diff-fenced | architect`) | |
| - **Reproducibility**: leaderboard-aligned; `--tries=2` (repeats) | |
| ### Quantitative Results (Reported & Planned) | |
| | Benchmark | gpt-oss-120b | HyperNova 60B 2602 | HyperNova 60B 2605 | | |
| |-----------------------|-------------------------------|-----------------------------|--------------------------| | |
| | HLE | 18.50 | 7.28 | 14.97 | | |
| | MMLU-Pro | 79.64 | 74.25 | 76.77 | | |
| | Tau2-bench (Telecom) | 63.74 | 60.53 | 61.70 | | |
| | AIME25 | 93.67 | 86.00 | 90.00 | | |
| | GPQA:d | 74.64 | 65.56 | 71.92 | | |
| | IFBench | 67.01 | 59.40 | 66.57 | | |
| | SciCode | 41.52 | 33.53 | 36.00 | | |
| | LiveCodeBench | 62.75 | 51.53 | 68.68 | | |
| | Terminal Bench | 24.24 | 12.12 | 15.91 | | |
| | AA-LCR | 49.00 | 35.67 | 40.33 | | |
| | AIDER | 43.60 | 26.2 | 34.2 | | |
|  | |
|  | |
| ### Quantitative Results (Inference Performance) | |
| #### Metrics reported | |
| - **System Output Throughput (higher is better)**: Mean output tokens per second across all concurrent requests over the benchmarking phase. | |
| - **End-to-End Latency per Query (lower is better):** Median end-to-end response time for each query from the time the query is sent. | |
| - **Output Speed per Query (higher is better):** Median output tokens per second after the first token is received for each query. | |
| - **Time to first token (TTFT) (lower is better):** Median time to first token. | |
| - **Estimated total memory — (lower is better):** Median from each GuideLLM phase (estimated total footprint: weights plus KV contribution from monitored usage). | |
| - **Model weights (lower is better):** | |
| On the same hardware and harness, **HyperNova 60B 2605** is compared to **gpt-oss-120b** using GuideLLM. Each table lists **median** values for that model at each **concurrency phase** (1 → 256 concurrent requests). | |
| | Metric | GPT-OSS-120B | Hypernova 60B 2605 | | |
| |--------|-------------:|-------------------:| | |
| | Concurrency | 128 | 128 | | |
| | Throughput (tok/s) | 3,821 | 5,210 | | |
| | E2E latency (s) | 24.05 | 14.74 | | |
| | Output speed (tok/s) | 57.79 | 69.31 | | |
| | TTFT (s) | 7.04 | 4.85 | | |
| | Est. total memory (GB) | 123.55 | 38.83 | | |
| | Model weights (GB) | 121.54 | 31.81 | | |
| #### Performance evaluation conditions | |
| Our performance evaluation follows the spirit of [Artificial Analysis](https://artificialanalysis.ai/methodology/system-load-test). | |
| - **Inference library**: vLLM 0.13.0 | |
| - **Monitoring libraries**: GuideLLM, nvidia-ml-py | |
| - **Hardware**: 1× NVIDIA H200 Tensor Core GPU | |
| - **Conditions**: **concurrency phases** 1, 2, 4, 8, 16, 32, 64, 128, 192, and 256 concurrent requests (one GuideLLM phase each) | |
| - **Phase duration**: Each phase lasts 3 minutes (excluding ramp-up and cool-down periods). | |
| - **Workload shape:** input length is ~1000 tokens per query (median); median output length varies by phase and model. | |
| - **Streaming**: Benchmarking is conducted with streaming enabled. | |
| The figure below is a **side-by-side comparison at concurrency = 128 only** | |
|  | |
| --- | |
| ## Languages | |
| - **Primary language**: English | |
| - **Other languages**: Not formally evaluated | |
| The model was trained primarily on English-language data. Performance on other languages may vary and has not been systematically measured. | |
| --- | |
| ## Intended Use | |
| ### Recommended Use Cases | |
| - **Reasoning and analysis** (with configurable reasoning effort where supported) | |
| - **Tool-augmented applications**, with emphasis on **coding** and **general** assistant use (function calling, web browsing, code execution, structured outputs) | |
| - **Code generation and reasoning** | |
| - **Chatbots and virtual assistants** | |
| - **Retrieval-augmented generation (RAG)** | |
| ### Out-of-Scope Uses | |
| - Harmful, illegal, or deceptive content generation | |
| - Impersonation of real individuals without consent | |
| - High-risk decision-making without human oversight | |
| - Surveillance or tracking of individuals | |
| - Any use that violates applicable laws or regulations | |
| --- | |
| ## Safety & Limitations | |
| ### Known Limitations | |
| - **English-centric** training data. | |
| - **Format:** For best results, use the same [harmony response format](https://huggingface.co/openai/gpt-oss-120b) as gpt-oss-120b where applicable; behavior may differ otherwise. | |
| - **Tool calling** depends on correct schema and tool design; exact parity with gpt-oss-120b or other models is not guaranteed. | |
| ### Recommendations | |
| - Validate tool outputs before execution | |
| - Use human oversight for critical applications | |
| - Perform task-specific evaluation prior to deployment | |
| --- | |
| ## Model Information | |
| | Field | Value | | |
| |--------------|--------------------- | | |
| | Model name | HyperNova 60B 2605 | | |
| | Version | 2605 | | |
| | Release date | 26/02/2026 | | |
| | Developed by | Multiverse Computing | | |
| | License | Apache 2.0 | | |
| | Contact | business@multiversecomputing.com | | |
| --- | |
| ## Citation | |
| If you use this model, please cite the base model and this variant: | |
| ```bibtex | |
| @misc{openai2025gptoss120b, | |
| title = {gpt-oss-120b \& gpt-oss-20b Model Card}, | |
| author = {OpenAI}, | |
| year = {2025}, | |
| eprint = {2508.10925}, | |
| archivePrefix = {arXiv}, | |
| primaryClass = {cs.CL}, | |
| url = {https://arxiv.org/abs/2508.10925} | |
| } | |
| @misc{hypernova60b2605, | |
| title = {HyperNova 60B 2605: Model developed based on gpt-oss-120b}, | |
| author = {Multiverse Computing}, | |
| year = {2026}, | |
| url = {https://huggingface.co/MultiverseComputingCAI/HyperNova-60B-2605}, | |
| note = {Model developed based on openai/gpt-oss-120b using CompactifAI technology} | |
| } | |
| ``` | |
| **Built by [Multiverse Computing](https://www.multiversecomputing.com)** · [Report an issue](https://huggingface.co/MultiverseComputingCAI/HyperNova-60B-2605/discussions) · [Discord](https://discord.gg/8mT9FveN) |