AI & ML interests

None defined yet.

Recent Activity

inference-optimization 's collections 5

Granite 4 Small and Tiny Quantized Models
FP8-block, FP8-dynamic, NVFP4, w4a16, w8a8 quantized models of ibm-granite/granite-4.0-h-small and ibm-granite/granite-4.0-h-tiny models
KV Cache Quantization
Collection on FP8 Quantization of Weights, Activations and KV Cache
Granite 4 Small and Tiny Quantized Models
FP8-block, FP8-dynamic, NVFP4, w4a16, w8a8 quantized models of ibm-granite/granite-4.0-h-small and ibm-granite/granite-4.0-h-tiny models
NVIDIA-Nemotron-3-Nano-30B-A3B Quantized Models
FP8-dynamic, FP8-block, NVFP4, INT4, versions of nvidia/NVIDIA-Nemotron-3-Nano-30B-A3B
Mixed Precision Models
Collection of Mixed Precision LLaMA and Qwen Models
KV Cache Quantization
Collection on FP8 Quantization of Weights, Activations and KV Cache