Qwen3Guard-Gen-8B - GGUF Quantized Versions

This repository provides GGUF quantized versions of Qwen3Guard-Gen-8B, converted with llama.cpp.

The base model was first exported from Hugging Face format to GGUF (FP16) and then quantized into multiple formats. These variants offer different trade-offs between model size, inference speed, and output quality.


πŸ”§ Model Details

  • Base model: Qwen/Qwen3Guard-Gen-8B
  • Architecture: Qwen 3 (8B parameters)
  • Format: GGUF
  • Intended use: Guardrail / safety-aligned text generation
  • Conversion tool: convert_hf_to_gguf.py (from llama.cpp)
  • Quantization tool: llama-quantize

πŸ“Š Quantized Versions

Quantization Filename Size (MiB) Notes
FP16 Qwen3Guard-Gen-8B-FP16.gguf ~15623 Full precision (baseline)
Q2_K Qwen3Guard-Gen-8B-Q2_K.gguf ~3204 Smallest, lowest accuracy
Q3_K_M Qwen3Guard-Gen-8B-Q3_K_M.gguf ~4027 Balanced small size
Q4_0 Qwen3Guard-Gen-8B-Q4_0.gguf ~4662 Good balance, faster
Q4_K_M Qwen3Guard-Gen-8B-Q4_K_M.gguf ~4909 Standard, widely used
Q5_K_M Qwen3Guard-Gen-8B-Q5_K_M.gguf ~5713 Better accuracy
Q6_K Qwen3Guard-Gen-8B-Q6_K.gguf ~6568 High accuracy
Q8_0 Qwen3Guard-Gen-8B-Q8_0.gguf ~8505 Near FP16 quality

πŸš€ Usage

πŸ–₯️ llama.cpp

Download a quantized file and run:

./main -m Qwen3Guard-Gen-8B-Q4_K_M.gguf -p "Hello, Qwen!"

🐍 Python

Directly download from hub, and use with llama-cpp-python.

from huggingface_hub import hf_hub_download
from llama_cpp import Llama

model_path = hf_hub_download(
    repo_id="ShahzebKhoso/Qwen3Guard-Gen-8B-GGUF",
    filename="Qwen3Guard-Gen-8B-Q4_K_M.gguf"
)

llm = Llama(model_path=model_path)

output = llm.create_chat_completion(
    messages=[
        {"role": "system", "content": "You are a helpful assistant."},
        {"role": "user", "content": "Hello, Qwen!"}
    ],
    max_tokens=100
)

print(output["choices"][0]["message"]["content"])

These GGUF versions are optimized for fast inference with CPU/GPU runtimes like llama.cpp, Ollama, and LM Studio.

Downloads last month
182
GGUF
Model size
8B params
Architecture
qwen3
Hardware compatibility
Log In to view the estimation

2-bit

3-bit

4-bit

5-bit

6-bit

8-bit

Inference Providers NEW
This model isn't deployed by any Inference Provider. πŸ™‹ Ask for provider support

Model tree for ShahzebKhoso/Qwen3Guard-Gen-8B-GGUF

Base model

Qwen/Qwen3-8B-Base
Finetuned
Qwen/Qwen3-8B
Quantized
(13)
this model