SmolVLM2-500M GGUF for HaploAI
Vision Language Model (VLM) for on-device image understanding on iOS/macOS.
Model Overview
| Property | Value |
|---|---|
| Base Model | HuggingFaceTB/SmolVLM2-500M-Video-Instruct |
| Format | GGUF (llama.cpp compatible) |
| Quantization | Q8_0 |
| Model Size | 437 MB |
| Vision Encoder | 199 MB |
| Total Size | ~636 MB |
| License | Apache 2.0 |
Capabilities
- Image Captioning: Describe what's in an image
- Visual Q&A: Answer questions about images
- Document/Text Extraction: Read and extract text from photos
- Scene Understanding: Analyze visual content and context
Files
| File | Size | Description |
|---|---|---|
SmolVLM2-500M-Video-Instruct-Q8_0.gguf |
437 MB | Main language model (Q8_0 quantized) |
mmproj-SmolVLM2-500M-Video-Instruct-f16.gguf |
199 MB | Vision encoder (f16 precision) |
Usage
Download URLs
Main Model: https://huggingface.co/jc-builds/smolvlm2-500m-gguf/resolve/main/SmolVLM2-500M-Video-Instruct-Q8_0.gguf
Vision Encoder: https://huggingface.co/jc-builds/smolvlm2-500m-gguf/resolve/main/mmproj-SmolVLM2-500M-Video-Instruct-f16.gguf
With llama.cpp
# Load both the main model and vision encoder
./llama-cli -m SmolVLM2-500M-Video-Instruct-Q8_0.gguf \
--mmproj mmproj-SmolVLM2-500M-Video-Instruct-f16.gguf \
--image your_image.jpg \
-p "Describe this image in detail"
With HaploAI (iOS/macOS)
This model is automatically available in HaploAI for on-device image understanding. Simply attach an image and ask questions about it.
Prompt Format
SmolVLM uses the <image> token to mark where the image should be processed:
<image>
What is shown in this image?
Performance Notes
- Memory Usage: ~800 MB during inference (model + vision encoder + context)
- Speed: Fast inference suitable for mobile devices
- Quality: Good balance of size vs capability for on-device use
License
This model is distributed under the Apache 2.0 license, which permits:
- Commercial use
- Modification
- Distribution
- Patent use
- Private use
Attribution
- Original Model: HuggingFaceTB/SmolVLM2-500M-Video-Instruct
- GGUF Conversion: ggml-org/SmolVLM2-500M-Video-Instruct-GGUF
- Distributed by: jc-builds for use in HaploAI
Related Models
For higher quality (larger size), see:
- SmolVLM2-2.2B (~2 GB) - Coming soon
- Downloads last month
- 59
Hardware compatibility
Log In
to view the estimation
8-bit
Model tree for jc-builds/smolvlm2-500m-gguf
Base model
HuggingFaceTB/SmolLM2-360M
Quantized
HuggingFaceTB/SmolLM2-360M-Instruct
Quantized
HuggingFaceTB/SmolVLM-500M-Instruct