@Qvelard on Hugging Face: "Hey ! I'm working on a small-scale multi-drone control system and I'm…"

Hugging Face

Join the conversation

Join the community of Machine Learners and AI enthusiasts.

Back to feed

Qvelard

posted an update 23 days ago

Post

252

Hey !

I'm working on a small-scale multi-drone control system and I'm looking for an open-source VLM that can run in real time on a Jetson Orin. If anyone knows a model or is personally interested in this kind of edge robotics problem, I'd love pointers.

What I'm trying to solve :

I have 4 simultaneous video streams coming from four drones (grayscale, 320×320 ). I can feed the model either:
• a 2×2 mosaic frame, or
• 4 separate frames as a batch.
Along with this, I provide a short text instruction describing the mission state.

What I need from the model :

A single structured JSON command representing the next action for the swarm controller. Something like (not decided yet):

{
  "action": "move_forward",
  "confidence": 0.87,
  "reason": "front corridor detected, no obstacles in drone_2 and drone_4 views"
}

So I need a VLM that can:
• handle multi-image or mosaic image input
• run efficiently on a Jetson Orin (ideally INT4/INT8 friendly, TensorRT-compatible)
• generate stable JSON outputs based on visual + textual context

I would really appreciate suggestions, or even just thoughts on what architectures make sense here.

Models like openbmb/MiniCPM-V-4_5, dustnehowl/nanoVLM, and Qwen/Qwen3-VL-8B-Instruct look promising, but I'm still exploring what’s actually viable on-device.

Happy to share benchmarks or test anything people want to throw at this problem. The multi-drone video + action JSON setup is niche but potentially useful to others building edge-deployed agents.

John6666

22 days ago

Small but powerful ones for now.
https://huggingface.co/Efficient-Large-Model/VILA1.5-3b-AWQ
https://huggingface.co/LiquidAI/LFM2-VL-450M
https://huggingface.co/collections/HuggingFaceTB/smolvlm2-smallest-video-lm-ever
https://huggingface.co/Qwen/Qwen2.5-VL-3B-Instruct

Qvelard

22 days ago

Thanks 😃 !

In this post