spow12 commited on
Commit
3799fd9
·
verified ·
1 Parent(s): 9d4aaf1

Add vLLM example

Browse files
Files changed (1) hide show
  1. README.md +101 -0
README.md CHANGED
@@ -304,6 +304,107 @@ Role: Popular, Shopkeeper, University Student, Waitstaff
304
  右の人は、髪が黒くて長くて、後ろで結んでいるわ。髪には赤いリボンがついていて、髪に色を添えているわ。目は大きくて、少し緑がかった感じ。服装は青い着物を着ていて、下には黒いショーツを履いているわ。座っている姿勢が少し恥ずかしいような、でも楽しそうな雰囲気ね。
305
  どう?説明に不足した点があったら言ってね。"""
306
  ```
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
307
 
308
  ## Dataset
309
 
 
304
  右の人は、髪が黒くて長くて、後ろで結んでいるわ。髪には赤いリボンがついていて、髪に色を添えているわ。目は大きくて、少し緑がかった感じ。服装は青い着物を着ていて、下には黒いショーツを履いているわ。座っている姿勢が少し恥ずかしいような、でも楽しそうな雰囲気ね。
305
  どう?説明に不足した点があったら言ってね。"""
306
  ```
307
+ ## Using vLLM
308
+
309
+ Currently, vLLM stable version doesn't supprot huggingface pixtral model. But they are working for that in developer version.
310
+
311
+ First you need to install latest vLLM developer version. Check this [document](https://docs.vllm.ai/en/latest/getting_started/installation.html)
312
+
313
+ ```bash
314
+ pip install https://vllm-wheels.s3.us-west-2.amazonaws.com/nightly/vllm-1.0.0.dev-cp38-abi3-manylinux1_x86_64.whl
315
+ ```
316
+
317
+ And You can run openai server using below command
318
+
319
+ Note, you need to specify chat template. Copy and paste from the processor chat template.
320
+ ```bash
321
+ export OMP_NUM_THREADS=8
322
+ export VLLM_ALLOW_LONG_MAX_MODEL_LEN=1
323
+
324
+ CUDA_VISIBLE_DEVICES=1 vllm serve spow12/ChatWaifu_2.0_vision \
325
+ --chat-template ./chat_templates/chatwaifu_vision.jinja \ # You have to change this for your setting.
326
+ --dtype bfloat16 \
327
+ --trust-remote-code \
328
+ --api-key token_abc123 \
329
+ --max-seq-len-to-capture 32768 \
330
+ --max_model_len 16384 \
331
+ --tensor-parallel-size 1 \
332
+ --pipeline-parallel-size 1 \
333
+ --port 5500 \
334
+ --served-model-name chat_model \
335
+ --limit-mm-per-prompt image=4 \
336
+ --allowed-local-media-path ./data/ # You can remove if you don't have a plan for using local image.
337
+ ```
338
+
339
+ After the OpenAI Server is pop up,
340
+
341
+ ```python
342
+ import requests, sys
343
+ from openai import OpenAI
344
+
345
+ client = OpenAI(
346
+ base_url="http://localhost:5500/v1",
347
+ api_key='token_abc123',
348
+ )
349
+
350
+ def add_completion(user_message, chat_history:list):
351
+ if chat_history[-1]['role'] == 'assistant':
352
+ chat_history.append({
353
+ 'role':'user',
354
+ 'content': user_message
355
+ })
356
+ completion = client.chat.completions.create(
357
+ model="chat_model",
358
+ messages=chat_history,
359
+ temperature=0.75,
360
+ max_tokens=512,
361
+ stop=['[/INST]', '<|im_end|>','</s>'],
362
+ stream=True,
363
+ stream_options={
364
+ "include_usage": True
365
+ },
366
+ extra_body={
367
+ "min_p": 0.05,
368
+ "repetition_penalty": 1.1,
369
+ }
370
+ )
371
+ completion_str = ""
372
+ for chunk in completion:
373
+ try:
374
+ content = chunk.choices[0].delta.content
375
+ if type(content) == str:
376
+ completion_str += content
377
+ print(content, end='') # Print without newline
378
+ sys.stdout.flush() # Ensure content is printed immediately
379
+ except IndexError:
380
+ pass
381
+ chat_history.append({
382
+ 'role': 'assistant',
383
+ 'content': completion_str
384
+ })
385
+ return chat_history
386
+
387
+ history = [
388
+ {
389
+ 'content': system,
390
+ 'role': 'system'
391
+ },
392
+ ]
393
+ user_content = {
394
+ "role": "user", "content": [
395
+ {
396
+ 'type': 'image_url',
397
+ 'image_url': {'url': url_natume}
398
+ },
399
+ {
400
+ 'type': 'image_url',
401
+ 'image_url': {'url': url_mako}
402
+ }
403
+ {"type": "text", "text": "ユーザー: この二人の外見を説明してみて。"},
404
+ ]
405
+ }
406
+ history = add_completion(user_content, history)
407
+ ```
408
 
409
  ## Dataset
410