Examples of how to include an image in the input using a URL and using base64.

#2
by Oyounghyun - opened

Hello, I tested the model using the Google Colab guide code.
However, it cannot recognize images — it always returns messages like “please upload an image” or “I can’t read the image.”

Is the prompt format different from the one below?

Also, shouldn’t the model be able to recognize images if I provide them in a URL format?

[CODE]
llm.create_chat_completion(
messages = [
{
"role": "user",
"content": [
{
"type": "text",
"text": "Describe this image!"
},
{
"type": "image_url",
"image_url": {
"url": "https://cdn.britannica.com/61/93061-050-99147DCE/Statue-of-Liberty-Island-New-York-Bay.jpg"
}
}
]
}
]
)

[RESPONSE]
@@@@@@@@@@@@@@@@@@@text@@@@@@@@@@@@@@@@@@@@@@@@
Describe this image!

@@@@@@@@@@@@@@@@@ VLM RESPONSE @@@@@@@@@@@@@@@@@
I'm unable to view or interpret images directly. However, if you describe the image to me or provide details about it, I'd be happy to help you analyze or discuss its contents!
@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@

image

Unsloth AI org

The mmproj file is separate. You need to use llama.cpp or something like lms studio to run ktt

Yes, so I loaded the LLM using a chat_handler as shown in the code below. Could you please confirm if this approach is correct, or recommend an alternative method?

[CODE]
chat_handler = Llava16ChatHandler(
clip_model_path=SUB_MODEL # ← mmproj File
)
llm = Llama.from_pretrained(
repo_id=REPO_NAME,
filename=MAIN_MODEL,
flash_attn=FLASH_ATTEN_FLAG,
n_batch = N_BATCH,
n_ubatch = N_UBATCH,
n_ctx=N_CTX, #max_token
chat_handler=chat_handler, # ← mmproj File
n_gpu_layers=-1,
verbose=True
)

Sign up or log in to comment