SmolVLM2-500M GGUF for HaploAI

Vision Language Model (VLM) for on-device image understanding on iOS/macOS.

Model Overview

Property	Value
Base Model	HuggingFaceTB/SmolVLM2-500M-Video-Instruct
Format	GGUF (llama.cpp compatible)
Quantization	Q8_0
Model Size	437 MB
Vision Encoder	199 MB
Total Size	~636 MB
License	Apache 2.0

Capabilities

Image Captioning: Describe what's in an image
Visual Q&A: Answer questions about images
Document/Text Extraction: Read and extract text from photos
Scene Understanding: Analyze visual content and context

Files

File	Size	Description
`SmolVLM2-500M-Video-Instruct-Q8_0.gguf`	437 MB	Main language model (Q8_0 quantized)
`mmproj-SmolVLM2-500M-Video-Instruct-f16.gguf`	199 MB	Vision encoder (f16 precision)

Usage

Download URLs

Main Model: https://huggingface.co/jc-builds/smolvlm2-500m-gguf/resolve/main/SmolVLM2-500M-Video-Instruct-Q8_0.gguf
Vision Encoder: https://huggingface.co/jc-builds/smolvlm2-500m-gguf/resolve/main/mmproj-SmolVLM2-500M-Video-Instruct-f16.gguf

With llama.cpp

# Load both the main model and vision encoder
./llama-cli -m SmolVLM2-500M-Video-Instruct-Q8_0.gguf \
            --mmproj mmproj-SmolVLM2-500M-Video-Instruct-f16.gguf \
            --image your_image.jpg \
            -p "Describe this image in detail"

With HaploAI (iOS/macOS)

This model is automatically available in HaploAI for on-device image understanding. Simply attach an image and ask questions about it.

Prompt Format

SmolVLM uses the <image> token to mark where the image should be processed:

<image>
What is shown in this image?

Performance Notes

Memory Usage: ~800 MB during inference (model + vision encoder + context)
Speed: Fast inference suitable for mobile devices
Quality: Good balance of size vs capability for on-device use

License

This model is distributed under the Apache 2.0 license, which permits:

Commercial use
Modification
Distribution
Patent use
Private use

Attribution

Original Model: HuggingFaceTB/SmolVLM2-500M-Video-Instruct
GGUF Conversion: ggml-org/SmolVLM2-500M-Video-Instruct-GGUF
Distributed by: jc-builds for use in HaploAI

Related Models

For higher quality (larger size), see:

SmolVLM2-2.2B (~2 GB) - Coming soon

Downloads last month: 59

GGUF

Model size

0.4B params

Architecture

llama

Hardware compatibility

8-bit

Model tree for jc-builds/smolvlm2-500m-gguf

Base model

HuggingFaceTB/SmolLM2-360M

Quantized

HuggingFaceTB/SmolLM2-360M-Instruct

Quantized

HuggingFaceTB/SmolVLM-500M-Instruct

Quantized

HuggingFaceTB/SmolVLM2-500M-Video-Instruct

Quantized

(10)

this model