FP8-block, FP8-dynamic, NVFP4, w4a16, w8a8 quantized models of ibm-granite/granite-4.0-h-small and ibm-granite/granite-4.0-h-tiny models
Inference Optimization
community
AI & ML interests
None defined yet.
Recent Activity
View all activity
FP8-dynamic, FP8-block, NVFP4, INT4, versions of nvidia/NVIDIA-Nemotron-3-Nano-30B-A3B
-
inference-optimization/NVIDIA-Nemotron-3-Nano-30B-A3B-FP8
Text Generation • 32B • Updated • 8 -
inference-optimization/NVIDIA-Nemotron-3-Nano-30B-A3B-NVFP4
18B • Updated • 155 -
inference-optimization/NVIDIA-Nemotron-3-Nano-30B-A3B-quantized.w4a16
6B • Updated • 41 -
inference-optimization/NVIDIA-Nemotron-3-Nano-30B-A3B-FP8-dynamic
32B • Updated • 196
FP8-block, FP8-dynamic, NVFP4, w4a16, w8a8 quantized models of ibm-granite/granite-4.0-h-small and ibm-granite/granite-4.0-h-tiny models
FP8-dynamic, FP8-block, NVFP4, INT4, versions of nvidia/NVIDIA-Nemotron-3-Nano-30B-A3B
-
inference-optimization/NVIDIA-Nemotron-3-Nano-30B-A3B-FP8
Text Generation • 32B • Updated • 8 -
inference-optimization/NVIDIA-Nemotron-3-Nano-30B-A3B-NVFP4
18B • Updated • 155 -
inference-optimization/NVIDIA-Nemotron-3-Nano-30B-A3B-quantized.w4a16
6B • Updated • 41 -
inference-optimization/NVIDIA-Nemotron-3-Nano-30B-A3B-FP8-dynamic
32B • Updated • 196
models
71
inference-optimization/DeepSeek-V3-debug-multiply-FP8_DYNAMIC
1B
•
Updated
inference-optimization/DeepSeek-V3-debug-add-FP8_DYNAMIC
1B
•
Updated
inference-optimization/DeepSeek-V3-debug-empty-FP8_DYNAMIC
1B
•
Updated
inference-optimization/DeepSeek-V3-debug-multiply-NVFP4A16
0.9B
•
Updated
inference-optimization/DeepSeek-V3-debug-add-NVFP4A16
0.9B
•
Updated
inference-optimization/DeepSeek-V3-debug-empty-NVFP4A16
0.9B
•
Updated
inference-optimization/DeepSeek-V3-debug-add
1B
•
Updated
inference-optimization/DeepSeek-V3-debug-multiply
1B
•
Updated
inference-optimization/Qwen3-0.6B-debug-add-FP8_BLOCK
0.6B
•
Updated
inference-optimization/Qwen3-0.6B-debug-multiply-FP8_BLOCK
0.6B
•
Updated
datasets
0
None public yet