AI & ML interests
None defined yet.
Recent Activity
View all activity
Papers
Cosmos Policy: Fine-Tuning Video Models for Visuomotor Control and Planning
Jet-RL: Enabling On-Policy FP8 Reinforcement Learning with Unified Training and Rollout Precision Flow
Articles
Open, Production-ready Enterprise Models
-
nvidia/NVIDIA-Nemotron-3-Nano-30B-A3B-Base-BF16
Text Generation • 32B • Updated • 33.3k • 99 -
nvidia/NVIDIA-Nemotron-3-Nano-30B-A3B-BF16
Text Generation • 32B • Updated • 440k • 592 -
nvidia/NVIDIA-Nemotron-3-Nano-30B-A3B-FP8
Text Generation • 32B • Updated • 735k • • 263 -
nvidia/Qwen3-Nemotron-235B-A22B-GenRM
Text Generation • 235B • Updated • 243 • 20
Steering Reasoning VLA in robotics manipulation https://www.arxiv.org/abs/2510.16281
The latest open, multimodal generation models for world generation and reasoning for Physical AI.
Open, state-of-the-art, production‑ready enterprise speech models from the NVIDIA Speech research team for ASR, TTS, Speaker Diarization and S2S
-
nvidia/nemotron-speech-streaming-en-0.6b
Automatic Speech Recognition • Updated • 8.88k • 432 -
nvidia/magpie_tts_multilingual_357m
Text-to-Speech • Updated • 1.05k • 65 -
nvidia/parakeet-tdt-0.6b-v3
Automatic Speech Recognition • Updated • 84.5k • 581 -
nvidia/parakeet_realtime_eou_120m-v1
Updated • 580 • 110
Large scale pre-training datasets used in the Nemotron family of models.
Collection of RL verifiable data for NeMo Gym
-
nvidia/Nemotron-RL-knowledge-web_search-mcqa
Viewer • Updated • 2.93k • 181 • 8 -
nvidia/Nemotron-RL-agent-workplace_assistant
Viewer • Updated • 1.8k • 206 • 13 -
nvidia/Nemotron-RL-instruction_following
Preview • Updated • 146 • 9 -
nvidia/Nemotron-RL-instruction_following-structured_outputs
Viewer • Updated • 9.95k • 122 • 28
A collection of generative models quantized and optimized for inference with Model Optimizer.
A collection related to the Alpamayo-R1 Reasoning VLA.
Framework of PyTorch composable modules for developing physics guided machine learning training pipelines. https://github.com/NVIDIA/physicsnemo
-
Earth2 Inference Demo
🚀4Visualize weather forecasts for any date and time range
-
DoMINO with Ahmed Body Dataset - Multi-Scale Neural Operator for CFD
🟢3Access JupyterLab for interactive coding
-
Modeling Magnetohydrodynamics with PhysicsNeMo
🟢2Access JupyterLab for interactive coding
-
nvidia/fourcastnet3
Updated • 157 • 9
A collection of great reward models for research and production
-
nvidia/Llama-3.3-Nemotron-70B-Reward-Principle
Text Generation • 71B • Updated • 119 • 6 -
nvidia/Qwen3-Nemotron-32B-GenRM-Principle
Text Generation • 33B • Updated • 181 • 11 -
nvidia/Qwen3-Nemotron-32B-RLBFF
Text Generation • 33B • Updated • 65 • 27 -
nvidia/Qwen3-Nemotron-8B-BRRM
Text Generation • Updated • 113 • 8
Lyra: Generative 3D Scene Reconstruction via Video Diffusion Model Self-Distillation
NVIDIA Clara Models for Biology
Improved World Simulation with Video Foundation Models for Physical AI
A collection of multilingual, region-specific synthetic persona datasets that support sovereign AI development across many countries and regions.
3D-Informed World-Consistent Video Generation with Precise Camera Control
Accelerated models for digital biology by the NVIDIA BioNeMo team. https://www.nvidia.com/en-us/clara/biopharma/
Collection of models for OpenReasoning-Nemotron which are trained on 5M reasoning traces for Math, Code and Science.
-
nvidia/OpenReasoning-Nemotron-1.5B
Text Generation • 2B • Updated • 369 • 53 -
nvidia/OpenReasoning-Nemotron-7B
Text Generation • 8B • Updated • 62k • • 49 -
nvidia/OpenReasoning-Nemotron-14B
Text Generation • 15B • Updated • 551 • 43 -
nvidia/OpenReasoning-Nemotron-32B
Text Generation • 33B • Updated • 693 • • 122
Open-weight Audio2Face-3D and Audio2Emotion networks and a sample dataset for training and evaluation
World Generation with Adaptive Multimodal Control
Mamba-Transformer hybrid models
-
nvidia/Nemotron-H-47B-Reasoning-128K
Text Generation • 47B • Updated • 526 • 20 -
nvidia/Nemotron-H-8B-Reasoning-128K
Text Generation • 8B • Updated • 152 • 25 -
nvidia/Nemotron-H-8B-Reasoning-128K-FP8
Text Generation • 8B • Updated • 56 • 12 -
nvidia/Nemotron-H-47B-Reasoning-128K-FP8
Text Generation • 47B • Updated • 38 • 5
Multimodal Large Language Models for Detailed Localized Image and Video Captioning
Models and datasets from "AIMO-2 Winning Solution: Building State-of-the-Art Mathematical Reasoning Models with OpenMathReasoning dataset"
Reasoning data for supervised finetuning of LLMs to advance code generation and critique
Benchmarks for evaluating synthetic verifiers like test case generation and code reward models (as found in https://www.arxiv.org/abs/2502.13820).
⚠️ The latest version of Cosmos Reason is now live!
👉 https://huggingface.co/collections/nvidia/cosmos-reason2
A suite of image and video tokenizers
A suite of image and video tokenizers
Collection of open, commercial-grade datasets for physical AI developers
⚠️ This collection is archived.
👉 https://huggingface.co/collections/nvidia/nvidia-cosmos-2
We are releasing math instruction models, math reward models, general instruction models, all training datasets, and a math reward benchmark.
Eagle is a family of frontier vision-language models with data-centric strategies. The model supports both HD image and long-context video input.
A series of Hybrid Small Language Models.
A family of frontier-class multimodal large language models (LLMs) that achieve state-of-the-art results on vision-language tasks and text-only tasks.
Nemotron-4: open models for Synthetic Data Generation (SDG). Includes Base, Instruct, and Reward models.
NeMo Parakeet ASR Models attain strong speech recognition accuracy while being efficient for inference. Available in CTC and RNN-Transducer variants.
-
nvidia/parakeet-rnnt-1.1b
Automatic Speech Recognition • Updated • 943 • 163 -
nvidia/parakeet-ctc-1.1b
Automatic Speech Recognition • 1B • Updated • 152k • 39 -
nvidia/parakeet-rnnt-0.6b
Automatic Speech Recognition • Updated • 7.15k • 12 -
nvidia/parakeet-ctc-0.6b
Automatic Speech Recognition • 0.6B • Updated • 3.65k • 24
InstructRetro is an autoregressive decoder-only language model (LM) with retrieval-augmented pretraining and instruction tuning.
A collection of models trained with Reinforcement Learning from Human Feedback (RLHF).
Llama3-ChatQA-1.5 models excel at conversational question answering (QA) and retrieval-augmented generation (RAG).
The Nemotron 3 8B Family of models is optimized for building production-ready generative AI applications for the enterprise.
MambaVision: A Hybrid Mamba-Transformer Vision Backbone. Includes both 1K and 21K pretrained models.
-
nvidia/MambaVision-L3-512-21K
Image Classification • 0.7B • Updated • 100 • 54 -
nvidia/MambaVision-L3-256-21K
Image Classification • 0.7B • Updated • 132 • 7 -
nvidia/MambaVision-L2-512-21K
Image Classification • 0.2B • Updated • 221 • 3 -
nvidia/MambaVision-L-21K
Image Classification • 0.2B • Updated • 89 • 4
A family of compressed models obtained via pruning and knowledge distillation
-
nvidia/Mistral-NeMo-Minitron-8B-Base
Text Generation • 8B • Updated • 2.81k • 176 -
nvidia/Mistral-NeMo-Minitron-8B-Instruct
Text Generation • 8B • Updated • 1.58k • 82 -
nvidia/Llama-3_1-Nemotron-51B-Instruct
Text Generation • 52B • Updated • 325 • 209 -
nvidia/Llama-3.1-Minitron-4B-Width-Base
Text Generation • 5B • Updated • 1.74k • 193
This is the collection that presents ChatQA-2, a suite of 128K long-context models, that also have exceptional RAG capabilities
Large scale pre-training datasets used in the Nemotron family of models.
Open, state of the art models for Climate and Weather
Scaling Cascaded Reinforcement Learning for General-Purpose Reasoning Models
-
nvidia/Nemotron-Cascade-8B
Text Generation • 8B • Updated • 4.22k • 59 -
nvidia/Nemotron-Cascade-8B-Thinking
Text Generation • 8B • Updated • 1.13k • 35 -
nvidia/Nemotron-Cascade-14B-Thinking
Text Generation • 15B • Updated • 7.46k • 68 -
nvidia/Nemotron-Cascade-8B-Intermediate-ckpts
Text Generation • Updated • 10
Collection of datasets used in the post-training phase of Nemotron Nano v3.
Set of tools to build retrieval-augmented generation (RAG) systems, improve search and ranking accuracy, and extract structured data from complex do
Research Project based off Cosmos Predict2 for Robot Manipulation Policy.
Open, Production-ready Enterprise Models. Nvidia Open Model license.
-
nvidia/NVIDIA-Nemotron-Nano-12B-v2
Text Generation • 12B • Updated • 90.6k • • 151 -
nvidia/NVIDIA-Nemotron-Nano-9B-v2
Text Generation • 9B • Updated • 111k • 468 -
nvidia/NVIDIA-Nemotron-Nano-9B-v2-Base
Text Generation • 9B • Updated • 55.2k • 42 -
nvidia/NVIDIA-Nemotron-Nano-12B-v2-Base
Text Generation • 12B • Updated • 1.14k • 88
A collection of speculative decoding modules created using Model Optimizer.
-
nvidia/gpt-oss-120b-Eagle3-short-context
Text Generation • Updated • 2.61k • 12 -
nvidia/gpt-oss-120b-Eagle3-long-context
Text Generation • 0.2B • Updated • 3.54k • 54 -
nvidia/gpt-oss-120b-Eagle3-throughput
Text Generation • Updated • 988 • 32 -
nvidia/Qwen3-235B-A22B-Eagle3
Text Generation • 0.3B • Updated • 1.78k • 9
ChronoEdit: Towards Temporal Reasoning for Image Editing and World Simulation
A collection of tokenizers, diffusion models, and datasets relevant to the cosmos-drive-dreams platform.
Open, Production-ready Enterprise Models
-
nvidia/Llama-3_3-Nemotron-Super-49B-v1_5
Text Generation • 50B • Updated • 25.2k • 222 -
nvidia/Llama-3_3-Nemotron-Super-49B-v1_5-FP8
Text Generation • 50B • Updated • 2.41k • 23 -
nvidia/Llama-3_1-Nemotron-Ultra-253B-v1
Text Generation • 253B • Updated • 18.5k • • 342 -
nvidia/Llama-3_3-Nemotron-Super-49B-v1
Text Generation • 50B • Updated • 15.8k • 320
NVIDIA Clara Open Models for medical imaging AI: segment, generate, and reason across CT, MRI, and X-ray. Built on MONAI by NVIDIA.
NVIDIA Clara Models for Molecular Science
Cosmos Reason 2 is an open, customizable, reasoning vision language model (VLM) for physical AI and robotics
State-of-the-Art Text Embedding Model
Ultra-efficient reasoning model! SOTA Accuracy / CoT Length trade-offs
⚠️ This collection is archived.
👉 https://huggingface.co/collections/nvidia/cosmos-predict25
Nemotron reward models. For use in RLHF pipelines and LLM-as-a-Judge
-
nvidia/Llama-3_3-Nemotron-Super-49B-GenRM
Text Generation • 50B • Updated • 238 • 18 -
nvidia/Llama-3_3-Nemotron-Super-49B-GenRM-Multilingual
Text Generation • 50B • Updated • 63 • 6 -
nvidia/Llama-3.3-Nemotron-70B-Reward
Text Generation • 71B • Updated • 695 • 2 -
nvidia/Llama-3.3-Nemotron-70B-Reward-Multilingual
Text Generation • 71B • Updated • 207 • 10
Math and Code reasoning model trained through reinforcement learning (RL)
Joint video-text embedding for physical AI
Math reasoning models trained through reinforcement learning (RL)
Reasoning data for supervised finetuning of LLMs to advance data distillation for competitive coding
-
nvidia/OpenCodeReasoning
Viewer • Updated • 753k • 2.93k • 524 -
OpenCodeReasoning: Advancing Data Distillation for Competitive Coding
Paper • 2504.01943 • Published • 15 -
nvidia/OpenCodeReasoning-Nemotron-7B
Text Generation • 8B • Updated • 89 • • 38 -
nvidia/OpenCodeReasoning-Nemotron-14B
Text Generation • 15B • Updated • 92 • 19
Novel ITS approach for open-ended tasks - No. 1 on Arena Hard on 18 Mar 2025
⚠️ This collection is archived.
👉 https://huggingface.co/collections/nvidia/cosmos-transfer25
⚠️ This collection is archived.
👉 https://huggingface.co/collections/nvidia/cosmos-predict2
SOTA models on Arena Hard and RewardBench as of 1 Oct 2024.
-
nvidia/Llama-3.1-Nemotron-70B-Instruct-HF
Text Generation • 71B • Updated • 4.25k • • 2.06k -
nvidia/Llama-3.1-Nemotron-70B-Reward-HF
71B • Updated • 1.08k • 90 -
nvidia/HelpSteer2
Viewer • Updated • 21.4k • 22.1k • 438 -
HelpSteer2-Preference: Complementing Ratings with Preferences
Paper • 2410.01257 • Published • 24
QLIP is a family of image tokenizers with SOTA reconstruction quality and zero-shot image understanding.
LLMs equipped with Dynamic Memory Compression to accelerate generation.
Essential datasets and models for content safety, topic-following, and security guardrails
-
nvidia/Aegis-AI-Content-Safety-Dataset-2.0
Viewer • Updated • 33.4k • 3.34k • 74 -
nvidia/llama-3.1-nemoguard-8b-topic-control
Text Classification • Updated • 686 • 16 -
nvidia/llama-3.1-nemoguard-8b-content-safety
Text Classification • Updated • 179 • 32 -
nvidia/CantTalkAboutThis-Topic-Control-Dataset
Viewer • Updated • 1.09k • 84 • 9
A series of Neural Audio Codecs
-
nvidia/nemo-nano-codec-22khz-1.89kbps-21.5fps
Feature Extraction • Updated • 1.53k • 9 -
nvidia/low-frame-rate-speech-codec-22khz
Feature Extraction • Updated • 119 • 19 -
nvidia/nemo-nano-codec-22khz-1.78kbps-12.5fps
Feature Extraction • Updated • 681 • 10 -
nvidia/nemo-nano-codec-22khz-0.6kbps-12.5fps
Feature Extraction • Updated • 1.26k • 16
Collection of optimized ONNX model checkpoints for NVIDIA RTX GPUs
A collection of models and datasets introduced in "OpenMathInstruct-2: Accelerating AI for Math with Massive Open-Source Instruction Data"
A collection of models and datasets relating to SteerLM and HelpSteer.
A collection of multilingual and multitask speech to text models from NVIDIA NeMo 🐤
-
nvidia/canary-1b
Automatic Speech Recognition • Updated • 1.56k • 457 -
nvidia/canary-1b-flash
Automatic Speech Recognition • 0.8B • Updated • 4.37k • 264 -
nvidia/canary-180m-flash
Automatic Speech Recognition • Updated • 1.44k • 90 -
Training and Inference Efficiency of Encoder-Decoder Speech Models
Paper • 2503.05931 • Published • 4
A collection of models and datasets introduced in "OpenMathInstruct-1: A 1.8 Million Math Instruction Tuning Dataset"
NV-Embed is a generalist embedding model encompassing retrieval, reranking, classification, clustering, STS tasks.
A collection of Mamba-2-based research models with 8B parameters trained on 3.5T tokens for comparison with Transformers.
BigVGAN is a universal neural vocoder that generates audio waveform using mel spectrogram as input.
Enabling 4k resolution for VLMs, CVPR 2025, https://nvlabs.github.io/PS3/
A collection of Foundation Vision Models that combine multiple models (CLIP, DINOv2, SAM, etc.).
Classifier models that can be used in NeMo Curator for labelling/filtering datasets.
Open, Production-ready Enterprise Models
-
nvidia/NVIDIA-Nemotron-3-Nano-30B-A3B-Base-BF16
Text Generation • 32B • Updated • 33.3k • 99 -
nvidia/NVIDIA-Nemotron-3-Nano-30B-A3B-BF16
Text Generation • 32B • Updated • 440k • 592 -
nvidia/NVIDIA-Nemotron-3-Nano-30B-A3B-FP8
Text Generation • 32B • Updated • 735k • • 263 -
nvidia/Qwen3-Nemotron-235B-A22B-GenRM
Text Generation • 235B • Updated • 243 • 20
Steering Reasoning VLA in robotics manipulation https://www.arxiv.org/abs/2510.16281
Open, state of the art models for Climate and Weather
The latest open, multimodal generation models for world generation and reasoning for Physical AI.
Scaling Cascaded Reinforcement Learning for General-Purpose Reasoning Models
-
nvidia/Nemotron-Cascade-8B
Text Generation • 8B • Updated • 4.22k • 59 -
nvidia/Nemotron-Cascade-8B-Thinking
Text Generation • 8B • Updated • 1.13k • 35 -
nvidia/Nemotron-Cascade-14B-Thinking
Text Generation • 15B • Updated • 7.46k • 68 -
nvidia/Nemotron-Cascade-8B-Intermediate-ckpts
Text Generation • Updated • 10
Open, state-of-the-art, production‑ready enterprise speech models from the NVIDIA Speech research team for ASR, TTS, Speaker Diarization and S2S
-
nvidia/nemotron-speech-streaming-en-0.6b
Automatic Speech Recognition • Updated • 8.88k • 432 -
nvidia/magpie_tts_multilingual_357m
Text-to-Speech • Updated • 1.05k • 65 -
nvidia/parakeet-tdt-0.6b-v3
Automatic Speech Recognition • Updated • 84.5k • 581 -
nvidia/parakeet_realtime_eou_120m-v1
Updated • 580 • 110
Collection of datasets used in the post-training phase of Nemotron Nano v3.
Large scale pre-training datasets used in the Nemotron family of models.
Set of tools to build retrieval-augmented generation (RAG) systems, improve search and ranking accuracy, and extract structured data from complex do
Collection of RL verifiable data for NeMo Gym
-
nvidia/Nemotron-RL-knowledge-web_search-mcqa
Viewer • Updated • 2.93k • 181 • 8 -
nvidia/Nemotron-RL-agent-workplace_assistant
Viewer • Updated • 1.8k • 206 • 13 -
nvidia/Nemotron-RL-instruction_following
Preview • Updated • 146 • 9 -
nvidia/Nemotron-RL-instruction_following-structured_outputs
Viewer • Updated • 9.95k • 122 • 28
Research Project based off Cosmos Predict2 for Robot Manipulation Policy.
Open, Production-ready Enterprise Models. Nvidia Open Model license.
-
nvidia/NVIDIA-Nemotron-Nano-12B-v2
Text Generation • 12B • Updated • 90.6k • • 151 -
nvidia/NVIDIA-Nemotron-Nano-9B-v2
Text Generation • 9B • Updated • 111k • 468 -
nvidia/NVIDIA-Nemotron-Nano-9B-v2-Base
Text Generation • 9B • Updated • 55.2k • 42 -
nvidia/NVIDIA-Nemotron-Nano-12B-v2-Base
Text Generation • 12B • Updated • 1.14k • 88
A collection of generative models quantized and optimized for inference with Model Optimizer.
A collection of speculative decoding modules created using Model Optimizer.
-
nvidia/gpt-oss-120b-Eagle3-short-context
Text Generation • Updated • 2.61k • 12 -
nvidia/gpt-oss-120b-Eagle3-long-context
Text Generation • 0.2B • Updated • 3.54k • 54 -
nvidia/gpt-oss-120b-Eagle3-throughput
Text Generation • Updated • 988 • 32 -
nvidia/Qwen3-235B-A22B-Eagle3
Text Generation • 0.3B • Updated • 1.78k • 9
A collection related to the Alpamayo-R1 Reasoning VLA.
ChronoEdit: Towards Temporal Reasoning for Image Editing and World Simulation
Framework of PyTorch composable modules for developing physics guided machine learning training pipelines. https://github.com/NVIDIA/physicsnemo
-
Earth2 Inference Demo
🚀4Visualize weather forecasts for any date and time range
-
DoMINO with Ahmed Body Dataset - Multi-Scale Neural Operator for CFD
🟢3Access JupyterLab for interactive coding
-
Modeling Magnetohydrodynamics with PhysicsNeMo
🟢2Access JupyterLab for interactive coding
-
nvidia/fourcastnet3
Updated • 157 • 9
A collection of tokenizers, diffusion models, and datasets relevant to the cosmos-drive-dreams platform.
A collection of great reward models for research and production
-
nvidia/Llama-3.3-Nemotron-70B-Reward-Principle
Text Generation • 71B • Updated • 119 • 6 -
nvidia/Qwen3-Nemotron-32B-GenRM-Principle
Text Generation • 33B • Updated • 181 • 11 -
nvidia/Qwen3-Nemotron-32B-RLBFF
Text Generation • 33B • Updated • 65 • 27 -
nvidia/Qwen3-Nemotron-8B-BRRM
Text Generation • Updated • 113 • 8
Open, Production-ready Enterprise Models
-
nvidia/Llama-3_3-Nemotron-Super-49B-v1_5
Text Generation • 50B • Updated • 25.2k • 222 -
nvidia/Llama-3_3-Nemotron-Super-49B-v1_5-FP8
Text Generation • 50B • Updated • 2.41k • 23 -
nvidia/Llama-3_1-Nemotron-Ultra-253B-v1
Text Generation • 253B • Updated • 18.5k • • 342 -
nvidia/Llama-3_3-Nemotron-Super-49B-v1
Text Generation • 50B • Updated • 15.8k • 320
Lyra: Generative 3D Scene Reconstruction via Video Diffusion Model Self-Distillation
NVIDIA Clara Open Models for medical imaging AI: segment, generate, and reason across CT, MRI, and X-ray. Built on MONAI by NVIDIA.
NVIDIA Clara Models for Biology
NVIDIA Clara Models for Molecular Science
Improved World Simulation with Video Foundation Models for Physical AI
Cosmos Reason 2 is an open, customizable, reasoning vision language model (VLM) for physical AI and robotics
A collection of multilingual, region-specific synthetic persona datasets that support sovereign AI development across many countries and regions.
State-of-the-Art Text Embedding Model
3D-Informed World-Consistent Video Generation with Precise Camera Control
Ultra-efficient reasoning model! SOTA Accuracy / CoT Length trade-offs
Accelerated models for digital biology by the NVIDIA BioNeMo team. https://www.nvidia.com/en-us/clara/biopharma/
Collection of models for OpenReasoning-Nemotron which are trained on 5M reasoning traces for Math, Code and Science.
-
nvidia/OpenReasoning-Nemotron-1.5B
Text Generation • 2B • Updated • 369 • 53 -
nvidia/OpenReasoning-Nemotron-7B
Text Generation • 8B • Updated • 62k • • 49 -
nvidia/OpenReasoning-Nemotron-14B
Text Generation • 15B • Updated • 551 • 43 -
nvidia/OpenReasoning-Nemotron-32B
Text Generation • 33B • Updated • 693 • • 122
⚠️ This collection is archived.
👉 https://huggingface.co/collections/nvidia/cosmos-predict25
Open-weight Audio2Face-3D and Audio2Emotion networks and a sample dataset for training and evaluation
Nemotron reward models. For use in RLHF pipelines and LLM-as-a-Judge
-
nvidia/Llama-3_3-Nemotron-Super-49B-GenRM
Text Generation • 50B • Updated • 238 • 18 -
nvidia/Llama-3_3-Nemotron-Super-49B-GenRM-Multilingual
Text Generation • 50B • Updated • 63 • 6 -
nvidia/Llama-3.3-Nemotron-70B-Reward
Text Generation • 71B • Updated • 695 • 2 -
nvidia/Llama-3.3-Nemotron-70B-Reward-Multilingual
Text Generation • 71B • Updated • 207 • 10
World Generation with Adaptive Multimodal Control
Math and Code reasoning model trained through reinforcement learning (RL)
Mamba-Transformer hybrid models
-
nvidia/Nemotron-H-47B-Reasoning-128K
Text Generation • 47B • Updated • 526 • 20 -
nvidia/Nemotron-H-8B-Reasoning-128K
Text Generation • 8B • Updated • 152 • 25 -
nvidia/Nemotron-H-8B-Reasoning-128K-FP8
Text Generation • 8B • Updated • 56 • 12 -
nvidia/Nemotron-H-47B-Reasoning-128K-FP8
Text Generation • 47B • Updated • 38 • 5
Joint video-text embedding for physical AI
Multimodal Large Language Models for Detailed Localized Image and Video Captioning
Math reasoning models trained through reinforcement learning (RL)
Models and datasets from "AIMO-2 Winning Solution: Building State-of-the-Art Mathematical Reasoning Models with OpenMathReasoning dataset"
Reasoning data for supervised finetuning of LLMs to advance data distillation for competitive coding
-
nvidia/OpenCodeReasoning
Viewer • Updated • 753k • 2.93k • 524 -
OpenCodeReasoning: Advancing Data Distillation for Competitive Coding
Paper • 2504.01943 • Published • 15 -
nvidia/OpenCodeReasoning-Nemotron-7B
Text Generation • 8B • Updated • 89 • • 38 -
nvidia/OpenCodeReasoning-Nemotron-14B
Text Generation • 15B • Updated • 92 • 19
Reasoning data for supervised finetuning of LLMs to advance code generation and critique
Novel ITS approach for open-ended tasks - No. 1 on Arena Hard on 18 Mar 2025
Benchmarks for evaluating synthetic verifiers like test case generation and code reward models (as found in https://www.arxiv.org/abs/2502.13820).
⚠️ The latest version of Cosmos Reason is now live!
👉 https://huggingface.co/collections/nvidia/cosmos-reason2
⚠️ This collection is archived.
👉 https://huggingface.co/collections/nvidia/cosmos-transfer25
A suite of image and video tokenizers
⚠️ This collection is archived.
👉 https://huggingface.co/collections/nvidia/cosmos-predict2
A suite of image and video tokenizers
SOTA models on Arena Hard and RewardBench as of 1 Oct 2024.
-
nvidia/Llama-3.1-Nemotron-70B-Instruct-HF
Text Generation • 71B • Updated • 4.25k • • 2.06k -
nvidia/Llama-3.1-Nemotron-70B-Reward-HF
71B • Updated • 1.08k • 90 -
nvidia/HelpSteer2
Viewer • Updated • 21.4k • 22.1k • 438 -
HelpSteer2-Preference: Complementing Ratings with Preferences
Paper • 2410.01257 • Published • 24
Collection of open, commercial-grade datasets for physical AI developers
QLIP is a family of image tokenizers with SOTA reconstruction quality and zero-shot image understanding.
⚠️ This collection is archived.
👉 https://huggingface.co/collections/nvidia/nvidia-cosmos-2
LLMs equipped with Dynamic Memory Compression to accelerate generation.
We are releasing math instruction models, math reward models, general instruction models, all training datasets, and a math reward benchmark.
Essential datasets and models for content safety, topic-following, and security guardrails
-
nvidia/Aegis-AI-Content-Safety-Dataset-2.0
Viewer • Updated • 33.4k • 3.34k • 74 -
nvidia/llama-3.1-nemoguard-8b-topic-control
Text Classification • Updated • 686 • 16 -
nvidia/llama-3.1-nemoguard-8b-content-safety
Text Classification • Updated • 179 • 32 -
nvidia/CantTalkAboutThis-Topic-Control-Dataset
Viewer • Updated • 1.09k • 84 • 9
Eagle is a family of frontier vision-language models with data-centric strategies. The model supports both HD image and long-context video input.
A series of Neural Audio Codecs
-
nvidia/nemo-nano-codec-22khz-1.89kbps-21.5fps
Feature Extraction • Updated • 1.53k • 9 -
nvidia/low-frame-rate-speech-codec-22khz
Feature Extraction • Updated • 119 • 19 -
nvidia/nemo-nano-codec-22khz-1.78kbps-12.5fps
Feature Extraction • Updated • 681 • 10 -
nvidia/nemo-nano-codec-22khz-0.6kbps-12.5fps
Feature Extraction • Updated • 1.26k • 16
A series of Hybrid Small Language Models.
Collection of optimized ONNX model checkpoints for NVIDIA RTX GPUs
A family of frontier-class multimodal large language models (LLMs) that achieve state-of-the-art results on vision-language tasks and text-only tasks.
A collection of models and datasets introduced in "OpenMathInstruct-2: Accelerating AI for Math with Massive Open-Source Instruction Data"
Nemotron-4: open models for Synthetic Data Generation (SDG). Includes Base, Instruct, and Reward models.
A collection of models and datasets relating to SteerLM and HelpSteer.
NeMo Parakeet ASR Models attain strong speech recognition accuracy while being efficient for inference. Available in CTC and RNN-Transducer variants.
-
nvidia/parakeet-rnnt-1.1b
Automatic Speech Recognition • Updated • 943 • 163 -
nvidia/parakeet-ctc-1.1b
Automatic Speech Recognition • 1B • Updated • 152k • 39 -
nvidia/parakeet-rnnt-0.6b
Automatic Speech Recognition • Updated • 7.15k • 12 -
nvidia/parakeet-ctc-0.6b
Automatic Speech Recognition • 0.6B • Updated • 3.65k • 24
A collection of multilingual and multitask speech to text models from NVIDIA NeMo 🐤
-
nvidia/canary-1b
Automatic Speech Recognition • Updated • 1.56k • 457 -
nvidia/canary-1b-flash
Automatic Speech Recognition • 0.8B • Updated • 4.37k • 264 -
nvidia/canary-180m-flash
Automatic Speech Recognition • Updated • 1.44k • 90 -
Training and Inference Efficiency of Encoder-Decoder Speech Models
Paper • 2503.05931 • Published • 4
InstructRetro is an autoregressive decoder-only language model (LM) with retrieval-augmented pretraining and instruction tuning.
A collection of models and datasets introduced in "OpenMathInstruct-1: A 1.8 Million Math Instruction Tuning Dataset"
A collection of models trained with Reinforcement Learning from Human Feedback (RLHF).
NV-Embed is a generalist embedding model encompassing retrieval, reranking, classification, clustering, STS tasks.
Llama3-ChatQA-1.5 models excel at conversational question answering (QA) and retrieval-augmented generation (RAG).
A collection of Mamba-2-based research models with 8B parameters trained on 3.5T tokens for comparison with Transformers.
The Nemotron 3 8B Family of models is optimized for building production-ready generative AI applications for the enterprise.
BigVGAN is a universal neural vocoder that generates audio waveform using mel spectrogram as input.
MambaVision: A Hybrid Mamba-Transformer Vision Backbone. Includes both 1K and 21K pretrained models.
-
nvidia/MambaVision-L3-512-21K
Image Classification • 0.7B • Updated • 100 • 54 -
nvidia/MambaVision-L3-256-21K
Image Classification • 0.7B • Updated • 132 • 7 -
nvidia/MambaVision-L2-512-21K
Image Classification • 0.2B • Updated • 221 • 3 -
nvidia/MambaVision-L-21K
Image Classification • 0.2B • Updated • 89 • 4
Enabling 4k resolution for VLMs, CVPR 2025, https://nvlabs.github.io/PS3/
A family of compressed models obtained via pruning and knowledge distillation
-
nvidia/Mistral-NeMo-Minitron-8B-Base
Text Generation • 8B • Updated • 2.81k • 176 -
nvidia/Mistral-NeMo-Minitron-8B-Instruct
Text Generation • 8B • Updated • 1.58k • 82 -
nvidia/Llama-3_1-Nemotron-51B-Instruct
Text Generation • 52B • Updated • 325 • 209 -
nvidia/Llama-3.1-Minitron-4B-Width-Base
Text Generation • 5B • Updated • 1.74k • 193
A collection of Foundation Vision Models that combine multiple models (CLIP, DINOv2, SAM, etc.).
This is the collection that presents ChatQA-2, a suite of 128K long-context models, that also have exceptional RAG capabilities
Classifier models that can be used in NeMo Curator for labelling/filtering datasets.
Large scale pre-training datasets used in the Nemotron family of models.