AMD Instinct MI210 + vllm fail to run this model, any solutions please? Is there any other deepseek-r1-671b models that can run succesfully on AMD Instinct MI210 + vllm? Thanks!

#33
by luciagan - opened

Error message:

File "/usr/local/lib/python3.12/dist-packages/vllm/model_executor/layers/fused_moe/layer.py", line 372, in init
assert self.quant_method is not None
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
AssertionError

vLLM version? start up command? OS env?

vLLM version? start up command? OS env?

Hi! @v2ray
Here is the details of vllm version and OS env: https://github.com/vllm-project/vllm/issues/16386

My start up commands are:

  1. Start a docker container:
    docker run -it --rm --ipc=host --cap-add=CAP_SYS_ADMIN --device=/dev/kfd --device=/dev/dri/card5 --device=/dev/mem --group-add render --cap-add=SYS_PTRACE --security-opt seccomp=unconfined
    --network host
    --name dsr1awq
    --shm-size 896g
    -v "/root/models:/models"
    --privileged
    -p 6381:6381
    -p 1001:1001
    -p 2001:2001
    -e NCCL_IB_HCA=mlx5
    -e NCCL_P2P_DISABLE=1
    vllm-dsr1:v1 bash

  2. In the started docker container, run model:
    CUDA_VISIBLE_DEVICES=0,1,2,3,4,5,6,7 python -m vllm.entrypoints.openai.api_server --model /models/DeepSeek-R1-awq --tensor-parallel-size 8 --port 1001 --enforce_eager --distributed-executor-backend mp --pipeline-parallel-size 1 --max-model-len 1024 --dtype float16 --max-num-batched-tokens 1024 --trust-remote-code --enable-prefix-caching

The docker image vllm-dsr1:v1 is the alias of rocm/vllm:rocm6.3.1_instinct_vllm0.7.3_20250311.

Thanks for your help!

vllm0.7.3

Try build from source.

vllm0.7.3

Try build from source.

Hi! Even building from source will still encounter the same assertion error. See https://github.com/vllm-project/vllm/issues/15101

Welp, then I have no idea. I only tested it on CUDA hardware.

Sign up or log in to comment