Add vLLM example
Browse files
README.md
CHANGED
|
@@ -304,6 +304,107 @@ Role: Popular, Shopkeeper, University Student, Waitstaff
|
|
| 304 |
右の人は、髪が黒くて長くて、後ろで結んでいるわ。髪には赤いリボンがついていて、髪に色を添えているわ。目は大きくて、少し緑がかった感じ。服装は青い着物を着ていて、下には黒いショーツを履いているわ。座っている姿勢が少し恥ずかしいような、でも楽しそうな雰囲気ね。
|
| 305 |
どう?説明に不足した点があったら言ってね。"""
|
| 306 |
```
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 307 |
|
| 308 |
## Dataset
|
| 309 |
|
|
|
|
| 304 |
右の人は、髪が黒くて長くて、後ろで結んでいるわ。髪には赤いリボンがついていて、髪に色を添えているわ。目は大きくて、少し緑がかった感じ。服装は青い着物を着ていて、下には黒いショーツを履いているわ。座っている姿勢が少し恥ずかしいような、でも楽しそうな雰囲気ね。
|
| 305 |
どう?説明に不足した点があったら言ってね。"""
|
| 306 |
```
|
| 307 |
+
## Using vLLM
|
| 308 |
+
|
| 309 |
+
Currently, vLLM stable version doesn't supprot huggingface pixtral model. But they are working for that in developer version.
|
| 310 |
+
|
| 311 |
+
First you need to install latest vLLM developer version. Check this [document](https://docs.vllm.ai/en/latest/getting_started/installation.html)
|
| 312 |
+
|
| 313 |
+
```bash
|
| 314 |
+
pip install https://vllm-wheels.s3.us-west-2.amazonaws.com/nightly/vllm-1.0.0.dev-cp38-abi3-manylinux1_x86_64.whl
|
| 315 |
+
```
|
| 316 |
+
|
| 317 |
+
And You can run openai server using below command
|
| 318 |
+
|
| 319 |
+
Note, you need to specify chat template. Copy and paste from the processor chat template.
|
| 320 |
+
```bash
|
| 321 |
+
export OMP_NUM_THREADS=8
|
| 322 |
+
export VLLM_ALLOW_LONG_MAX_MODEL_LEN=1
|
| 323 |
+
|
| 324 |
+
CUDA_VISIBLE_DEVICES=1 vllm serve spow12/ChatWaifu_2.0_vision \
|
| 325 |
+
--chat-template ./chat_templates/chatwaifu_vision.jinja \ # You have to change this for your setting.
|
| 326 |
+
--dtype bfloat16 \
|
| 327 |
+
--trust-remote-code \
|
| 328 |
+
--api-key token_abc123 \
|
| 329 |
+
--max-seq-len-to-capture 32768 \
|
| 330 |
+
--max_model_len 16384 \
|
| 331 |
+
--tensor-parallel-size 1 \
|
| 332 |
+
--pipeline-parallel-size 1 \
|
| 333 |
+
--port 5500 \
|
| 334 |
+
--served-model-name chat_model \
|
| 335 |
+
--limit-mm-per-prompt image=4 \
|
| 336 |
+
--allowed-local-media-path ./data/ # You can remove if you don't have a plan for using local image.
|
| 337 |
+
```
|
| 338 |
+
|
| 339 |
+
After the OpenAI Server is pop up,
|
| 340 |
+
|
| 341 |
+
```python
|
| 342 |
+
import requests, sys
|
| 343 |
+
from openai import OpenAI
|
| 344 |
+
|
| 345 |
+
client = OpenAI(
|
| 346 |
+
base_url="http://localhost:5500/v1",
|
| 347 |
+
api_key='token_abc123',
|
| 348 |
+
)
|
| 349 |
+
|
| 350 |
+
def add_completion(user_message, chat_history:list):
|
| 351 |
+
if chat_history[-1]['role'] == 'assistant':
|
| 352 |
+
chat_history.append({
|
| 353 |
+
'role':'user',
|
| 354 |
+
'content': user_message
|
| 355 |
+
})
|
| 356 |
+
completion = client.chat.completions.create(
|
| 357 |
+
model="chat_model",
|
| 358 |
+
messages=chat_history,
|
| 359 |
+
temperature=0.75,
|
| 360 |
+
max_tokens=512,
|
| 361 |
+
stop=['[/INST]', '<|im_end|>','</s>'],
|
| 362 |
+
stream=True,
|
| 363 |
+
stream_options={
|
| 364 |
+
"include_usage": True
|
| 365 |
+
},
|
| 366 |
+
extra_body={
|
| 367 |
+
"min_p": 0.05,
|
| 368 |
+
"repetition_penalty": 1.1,
|
| 369 |
+
}
|
| 370 |
+
)
|
| 371 |
+
completion_str = ""
|
| 372 |
+
for chunk in completion:
|
| 373 |
+
try:
|
| 374 |
+
content = chunk.choices[0].delta.content
|
| 375 |
+
if type(content) == str:
|
| 376 |
+
completion_str += content
|
| 377 |
+
print(content, end='') # Print without newline
|
| 378 |
+
sys.stdout.flush() # Ensure content is printed immediately
|
| 379 |
+
except IndexError:
|
| 380 |
+
pass
|
| 381 |
+
chat_history.append({
|
| 382 |
+
'role': 'assistant',
|
| 383 |
+
'content': completion_str
|
| 384 |
+
})
|
| 385 |
+
return chat_history
|
| 386 |
+
|
| 387 |
+
history = [
|
| 388 |
+
{
|
| 389 |
+
'content': system,
|
| 390 |
+
'role': 'system'
|
| 391 |
+
},
|
| 392 |
+
]
|
| 393 |
+
user_content = {
|
| 394 |
+
"role": "user", "content": [
|
| 395 |
+
{
|
| 396 |
+
'type': 'image_url',
|
| 397 |
+
'image_url': {'url': url_natume}
|
| 398 |
+
},
|
| 399 |
+
{
|
| 400 |
+
'type': 'image_url',
|
| 401 |
+
'image_url': {'url': url_mako}
|
| 402 |
+
}
|
| 403 |
+
{"type": "text", "text": "ユーザー: この二人の外見を説明してみて。"},
|
| 404 |
+
]
|
| 405 |
+
}
|
| 406 |
+
history = add_completion(user_content, history)
|
| 407 |
+
```
|
| 408 |
|
| 409 |
## Dataset
|
| 410 |
|