Qwen3Guard-Gen-8B - GGUF Quantized Versions

This repository provides GGUF quantized versions of Qwen3Guard-Gen-8B, converted with llama.cpp.

The base model was first exported from Hugging Face format to GGUF (FP16) and then quantized into multiple formats. These variants offer different trade-offs between model size, inference speed, and output quality.

🔧 Model Details

Base model: Qwen/Qwen3Guard-Gen-8B
Architecture: Qwen 3 (8B parameters)
Format: GGUF
Intended use: Guardrail / safety-aligned text generation
Conversion tool: convert_hf_to_gguf.py (from llama.cpp)
Quantization tool: llama-quantize

📊 Quantized Versions

Quantization	Filename	Size (MiB)	Notes
FP16	`Qwen3Guard-Gen-8B-FP16.gguf`	~15623	Full precision (baseline)
Q2_K	`Qwen3Guard-Gen-8B-Q2_K.gguf`	~3204	Smallest, lowest accuracy
Q3_K_M	`Qwen3Guard-Gen-8B-Q3_K_M.gguf`	~4027	Balanced small size
Q4_0	`Qwen3Guard-Gen-8B-Q4_0.gguf`	~4662	Good balance, faster
Q4_K_M	`Qwen3Guard-Gen-8B-Q4_K_M.gguf`	~4909	Standard, widely used
Q5_K_M	`Qwen3Guard-Gen-8B-Q5_K_M.gguf`	~5713	Better accuracy
Q6_K	`Qwen3Guard-Gen-8B-Q6_K.gguf`	~6568	High accuracy
Q8_0	`Qwen3Guard-Gen-8B-Q8_0.gguf`	~8505	Near FP16 quality

🚀 Usage

🖥️ llama.cpp

Download a quantized file and run:

./main -m Qwen3Guard-Gen-8B-Q4_K_M.gguf -p "Hello, Qwen!"

🐍 Python

Directly download from hub, and use with llama-cpp-python.

from huggingface_hub import hf_hub_download
from llama_cpp import Llama

model_path = hf_hub_download(
    repo_id="ShahzebKhoso/Qwen3Guard-Gen-8B-GGUF",
    filename="Qwen3Guard-Gen-8B-Q4_K_M.gguf"
)

llm = Llama(model_path=model_path)

output = llm.create_chat_completion(
    messages=[
        {"role": "system", "content": "You are a helpful assistant."},
        {"role": "user", "content": "Hello, Qwen!"}
    ],
    max_tokens=100
)

print(output["choices"][0]["message"]["content"])

These GGUF versions are optimized for fast inference with CPU/GPU runtimes like llama.cpp, Ollama, and LM Studio.

Downloads last month: 182

GGUF

Model size

8B params

Architecture

qwen3

Hardware compatibility

2-bit

3-bit

4-bit

5-bit

6-bit

8-bit

View +1 variant

Inference Providers NEW

This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for ShahzebKhoso/Qwen3Guard-Gen-8B-GGUF

Base model

Qwen/Qwen3-8B-Base

Finetuned

Qwen/Qwen3-8B

Finetuned

Qwen/Qwen3Guard-Gen-8B

Quantized

(13)

this model