Qwen3Guard-Gen-8B - GGUF Quantized Versions
This repository provides GGUF quantized versions of Qwen3Guard-Gen-8B, converted with llama.cpp.
The base model was first exported from Hugging Face format to GGUF (FP16) and then quantized into multiple formats. These variants offer different trade-offs between model size, inference speed, and output quality.
π§ Model Details
- Base model: Qwen/Qwen3Guard-Gen-8B
- Architecture: Qwen 3 (8B parameters)
- Format: GGUF
- Intended use: Guardrail / safety-aligned text generation
- Conversion tool:
convert_hf_to_gguf.py(from llama.cpp) - Quantization tool:
llama-quantize
π Quantized Versions
| Quantization | Filename | Size (MiB) | Notes |
|---|---|---|---|
| FP16 | Qwen3Guard-Gen-8B-FP16.gguf |
~15623 | Full precision (baseline) |
| Q2_K | Qwen3Guard-Gen-8B-Q2_K.gguf |
~3204 | Smallest, lowest accuracy |
| Q3_K_M | Qwen3Guard-Gen-8B-Q3_K_M.gguf |
~4027 | Balanced small size |
| Q4_0 | Qwen3Guard-Gen-8B-Q4_0.gguf |
~4662 | Good balance, faster |
| Q4_K_M | Qwen3Guard-Gen-8B-Q4_K_M.gguf |
~4909 | Standard, widely used |
| Q5_K_M | Qwen3Guard-Gen-8B-Q5_K_M.gguf |
~5713 | Better accuracy |
| Q6_K | Qwen3Guard-Gen-8B-Q6_K.gguf |
~6568 | High accuracy |
| Q8_0 | Qwen3Guard-Gen-8B-Q8_0.gguf |
~8505 | Near FP16 quality |
π Usage
π₯οΈ llama.cpp
Download a quantized file and run:
./main -m Qwen3Guard-Gen-8B-Q4_K_M.gguf -p "Hello, Qwen!"
π Python
Directly download from hub, and use with llama-cpp-python.
from huggingface_hub import hf_hub_download
from llama_cpp import Llama
model_path = hf_hub_download(
repo_id="ShahzebKhoso/Qwen3Guard-Gen-8B-GGUF",
filename="Qwen3Guard-Gen-8B-Q4_K_M.gguf"
)
llm = Llama(model_path=model_path)
output = llm.create_chat_completion(
messages=[
{"role": "system", "content": "You are a helpful assistant."},
{"role": "user", "content": "Hello, Qwen!"}
],
max_tokens=100
)
print(output["choices"][0]["message"]["content"])
These GGUF versions are optimized for fast inference with CPU/GPU runtimes like llama.cpp, Ollama, and LM Studio.
- Downloads last month
- 182
Hardware compatibility
Log In
to view the estimation
Inference Providers
NEW
This model isn't deployed by any Inference Provider.
π
Ask for provider support