inference-optimization/Ministral-3-14B-Instruct-2512-BF16-FP8-DYNAMIC-BASE
14B
•
Updated
•
152
inference-optimization/Qwen3-30B-A3B-Instruct-2507.w8a8
31B
•
Updated
•
23
inference-optimization/Qwen3-30B-A3B-Thinking-2507.w8a8
31B
•
Updated
•
24
inference-optimization/Qwen3-4B-Thinking-2507.w8a8
4B
•
Updated
•
38
inference-optimization/Qwen3-4B-Instruct-2507.w8a8
4B
•
Updated
•
28
inference-optimization/Ministral-3-14B-Instruct-2512-FP8
14B
•
Updated
•
66
inference-optimization/granite-4.0-h-small-quantized.w8a8
Updated
inference-optimization/granite-4.0-h-small-NVFP4
Updated
inference-optimization/granite-4.0-h-small-quantized.w4a16
Updated
inference-optimization/granite-4.0-h-small-FP8-dynamic
Updated
inference-optimization/granite-4.0-h-small-FP8-block
Updated
inference-optimization/NVIDIA-Nemotron-3-Nano-30B-A3B-NVFP4
18B
•
Updated
•
184
inference-optimization/GLM-4.6-quantized.w4a16
48B
•
Updated
•
54
inference-optimization/NVIDIA-Nemotron-3-Nano-30B-A3B-FP8
Text Generation
•
32B
•
Updated
•
8
inference-optimization/Qwen3-Next-80B-A3B-Thinking-FP8
Text Generation
•
81B
•
Updated
•
5
inference-optimization/Qwen3-Next-80B-A3B-Instruct-FP8
Text Generation
•
81B
•
Updated
•
12
inference-optimization/NVIDIA-Nemotron-3-Nano-30B-A3B-quantized.w4a16
6B
•
Updated
•
51
inference-optimization/NVIDIA-Nemotron-3-Nano-30B-A3B-FP8-dynamic
32B
•
Updated
•
183
inference-optimization/Qwen3-Next-80B-A3B-Thinking-quantized.w8a8
Updated
inference-optimization/NVIDIA-Nemotron-3-Nano-30B-A3B-FP8-block
Updated
inference-optimization/GLM-4.6-quantized.w8a8
353B
•
Updated
•
19
inference-optimization/Qwen3-30B-A3B-Thinking-2507.w4a16
Text Generation
•
5B
•
Updated
•
3
inference-optimization/Qwen3-4B-Instruct-2507.w4a16
Text Generation
•
1B
•
Updated
•
6
inference-optimization/Qwen3-4B-Thinking-2507.w4a16
Text Generation
•
1B
•
Updated
•
42
inference-optimization/GLM-4.6-FP8-dynamic
353B
•
Updated
•
19
inference-optimization/GLM-4.6-NVFP4
199B
•
Updated
•
418
inference-optimization/Llama-3.1-8B-Instruct-FP8-dynamic-QKV-Cache-FP8-Per-Head
8B
•
Updated
inference-optimization/Llama-3.1-8B-Instruct-QKV-Cache-FP8-Per-Head
8B
•
Updated
•
1
inference-optimization/Qwen3-Next-80B-A3B-Instruct-quantized.w8a8
Updated
inference-optimization/Llama-3.1-8B-Instruct-HIGGS-quantized-paths
Updated