Examples of how to include an image in the input using a URL and using base64.

by Oyounghyun - opened 3 days ago

3 days ago

Hello, I tested the model using the Google Colab guide code.
However, it cannot recognize images — it always returns messages like “please upload an image” or “I can’t read the image.”

Is the prompt format different from the one below?

Also, shouldn’t the model be able to recognize images if I provide them in a URL format?

[CODE]
llm.create_chat_completion(
messages = [
{
"role": "user",
"content": [
{
"type": "text",
"text": "Describe this image!"
},
{
"type": "image_url",
"image_url": {
"url": "https://cdn.britannica.com/61/93061-050-99147DCE/Statue-of-Liberty-Island-New-York-Bay.jpg"
}
}
]
}
]
)

[RESPONSE]
@@@@@@@@@@@@@@@@@@@text@@@@@@@@@@@@@@@@@@@@@@@@
Describe this image!

@@@@@@@@@@@@@@@@@ VLM RESPONSE @@@@@@@@@@@@@@@@@
I'm unable to view or interpret images directly. However, if you describe the image to me or provide details about it, I'd be happy to help you analyze or discuss its contents!
@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@

shimmyshimmer

Unsloth AI org 2 days ago

The mmproj file is separate. You need to use llama.cpp or something like lms studio to run ktt

Oyounghyun

1 day ago

Yes, so I loaded the LLM using a chat_handler as shown in the code below. Could you please confirm if this approach is correct, or recommend an alternative method?

[CODE]
chat_handler = Llava16ChatHandler(
clip_model_path=SUB_MODEL # ← mmproj File
)
llm = Llama.from_pretrained(
repo_id=REPO_NAME,
filename=MAIN_MODEL,
flash_attn=FLASH_ATTEN_FLAG,
n_batch = N_BATCH,
n_ubatch = N_UBATCH,
n_ctx=N_CTX, #max_token
chat_handler=chat_handler, # ← mmproj File
n_gpu_layers=-1,
verbose=True
)

Upload images, audio, and videos by dragging in the text input, pasting, or clicking here.

Tap or paste here to upload images

· Sign up or log in to comment