Any advice on improving encoding speed?

by shadowT - opened 14 days ago

Discussion

shadowT

14 days ago

•

edited 14 days ago

Hi @hxssgaa ,

I wanted to ask for your opinion regarding the encoding speed I’m observing.

Here are some timing results:

[✅ FAST] encode_queries | Time: 0.7000s
[⚠️ SLOW] encode_docs | Time: 1.7310s

Do you have any recommendations or best practices to improve the encoding speed?

For reference, this is my current configuration:

self.processor = AutoProcessor.from_pretrained( model_id, trust_remote_code=True, max_num_visual_tokens=1280, )

self.model = AutoModel.from_pretrained( model_id, dtype=torch.bfloat16, attn_implementation="flash_attention_2", trust_remote_code=True, device_map="auto", ).eval()

Thank you in advance for your help.

hxssgaa

Tomoro AI Ltd org 13 days ago

Hi @shadowT , I think depending on different GPUs, the encode_docs could be slow for large images, and this is due to limitation of Qwen3-VL backbone model and there is nothing we can do about it. My general advice could be:

Try to use torch.compile to see whether it helps.
Try to increase batch size for docs encoding, the prefill for a single image could be slow, but throughput of batched image encoding could be much higher.

Hope it helps.

Upload images, audio, and videos by dragging in the text input, pasting, or clicking here.

Tap or paste here to upload images

· Sign up or log in to comment