Any advice on improving encoding speed?
Hi @hxssgaa ,
I wanted to ask for your opinion regarding the encoding speed I’m observing.
Here are some timing results:
[✅ FAST] encode_queries | Time: 0.7000s[⚠️ SLOW] encode_docs | Time: 1.7310s
Do you have any recommendations or best practices to improve the encoding speed?
For reference, this is my current configuration:
self.processor = AutoProcessor.from_pretrained( model_id, trust_remote_code=True, max_num_visual_tokens=1280, )
self.model = AutoModel.from_pretrained( model_id, dtype=torch.bfloat16, attn_implementation="flash_attention_2", trust_remote_code=True, device_map="auto", ).eval()
Thank you in advance for your help.
Hi
@shadowT
, I think depending on different GPUs, the encode_docs could be slow for large images, and this is due to limitation of Qwen3-VL backbone model and there is nothing we can do about it. My general advice could be:
- Try to use
torch.compileto see whether it helps. - Try to increase batch size for docs encoding, the prefill for a single image could be slow, but throughput of batched image encoding could be much higher.
Hope it helps.