Qwen3-VL-2B-Instruct-GGUF
Qwen3-VL-2B-Instruct is a highly advanced 2-billion-parameter vision-language model from the Qwen3 series, designed to deliver superior multimodal understanding and generation by seamlessly integrating deep visual perception with strong text understanding and generation capabilities. It supports extremely long context lengths of up to 256K tokens (expandable to 1 million), enabling it to handle large documents, books, or long videos with detailed recall and temporal event tracking. The model features advanced spatial reasoning including 2D and 3D object grounding, enhanced visual coding abilities to generate code from images or videos (e.g., Draw.io, HTML, CSS, JavaScript), and agent-like functions to interpret and interact with GUI elements on PC/mobile interfaces. It is optimized for efficient deployment with instruction tuning for flexible interactive tasks and strong multimodal reasoning, particularly excelling in STEM and math problem-solving, causal analysis, and evidence-based answers, making it a powerful tool for diverse vision-language applications across textual, visual, and temporal domains.
Model Files
| File Name | Quant Type | File Size |
|---|---|---|
| Qwen3-VL-2B-Instruct-BF16.gguf | BF16 | 3.45 GB |
| Qwen3-VL-2B-Instruct-F16.gguf | F16 | 3.45 GB |
| Qwen3-VL-2B-Instruct-F32.gguf | F32 | 6.89 GB |
| Qwen3-VL-2B-Instruct-Q8_0.gguf | Q8_0 | 1.83 GB |
| Qwen3-VL-2B-Instruct-mmproj-bf16.gguf | mmproj-bf16 | 823 MB |
| Qwen3-VL-2B-Instruct-mmproj-f16.gguf | mmproj-f16 | 819 MB |
| Qwen3-VL-2B-Instruct-mmproj-q8_0.gguf | mmproj-q8_0 | 445 MB |
Quants Usage
(sorted by size, not necessarily quality. IQ-quants are often preferable over similar sized non-IQ quants)
Here is a handy graph by ikawrakow comparing some lower-quality quant types (lower is better):
- Downloads last month
- 800
8-bit
16-bit
32-bit
Model tree for prithivMLmods/Qwen3-VL-2B-Instruct-GGUF
Base model
Qwen/Qwen3-VL-2B-Instruct