Any advice on improving encoding speed?

#2
by shadowT - opened

Hi @hxssgaa ,

I wanted to ask for your opinion regarding the encoding speed I’m observing.

Here are some timing results:

[✅ FAST] encode_queries | Time: 0.7000s
[⚠️ SLOW] encode_docs | Time: 1.7310s

Do you have any recommendations or best practices to improve the encoding speed?

For reference, this is my current configuration:

self.processor = AutoProcessor.from_pretrained( model_id, trust_remote_code=True, max_num_visual_tokens=1280, )

self.model = AutoModel.from_pretrained( model_id, dtype=torch.bfloat16, attn_implementation="flash_attention_2", trust_remote_code=True, device_map="auto", ).eval()

Thank you in advance for your help.

Tomoro AI Ltd org

Hi @shadowT , I think depending on different GPUs, the encode_docs could be slow for large images, and this is due to limitation of Qwen3-VL backbone model and there is nothing we can do about it. My general advice could be:

  • Try to use torch.compile to see whether it helps.
  • Try to increase batch size for docs encoding, the prefill for a single image could be slow, but throughput of batched image encoding could be much higher.

Hope it helps.

Sign up or log in to comment