Qwen3-VL-2B-Instruct-GGUF

Qwen3-VL-2B-Instruct is a highly advanced 2-billion-parameter vision-language model from the Qwen3 series, designed to deliver superior multimodal understanding and generation by seamlessly integrating deep visual perception with strong text understanding and generation capabilities. It supports extremely long context lengths of up to 256K tokens (expandable to 1 million), enabling it to handle large documents, books, or long videos with detailed recall and temporal event tracking. The model features advanced spatial reasoning including 2D and 3D object grounding, enhanced visual coding abilities to generate code from images or videos (e.g., Draw.io, HTML, CSS, JavaScript), and agent-like functions to interpret and interact with GUI elements on PC/mobile interfaces. It is optimized for efficient deployment with instruction tuning for flexible interactive tasks and strong multimodal reasoning, particularly excelling in STEM and math problem-solving, causal analysis, and evidence-based answers, making it a powerful tool for diverse vision-language applications across textual, visual, and temporal domains.

Model Files

File Name	Quant Type	File Size
Qwen3-VL-2B-Instruct-BF16.gguf	BF16	3.45 GB
Qwen3-VL-2B-Instruct-F16.gguf	F16	3.45 GB
Qwen3-VL-2B-Instruct-F32.gguf	F32	6.89 GB
Qwen3-VL-2B-Instruct-Q8_0.gguf	Q8_0	1.83 GB
Qwen3-VL-2B-Instruct-mmproj-bf16.gguf	mmproj-bf16	823 MB
Qwen3-VL-2B-Instruct-mmproj-f16.gguf	mmproj-f16	819 MB
Qwen3-VL-2B-Instruct-mmproj-q8_0.gguf	mmproj-q8_0	445 MB

Quants Usage

(sorted by size, not necessarily quality. IQ-quants are often preferable over similar sized non-IQ quants)

Here is a handy graph by ikawrakow comparing some lower-quality quant types (lower is better):

Downloads last month: 800

GGUF

Model size

2B params

Architecture

qwen3vl

Hardware compatibility

8-bit

16-bit

32-bit

Model tree for prithivMLmods/Qwen3-VL-2B-Instruct-GGUF

Base model

Qwen/Qwen3-VL-2B-Instruct

Quantized

(36)

this model