Testing report on Intel Xeon W5-3425

#2
by SlavikF - opened

System:

  • Intel Xeon W5-3425, 12 cores
  • DDR5-4800 RAM (8 channels * 64GB), mlc reports ~190GB/s
  • Ubuntu 24

llama.cpp:

  • commit 5edfe782a9d7dc1b717f9d132c42404c7a517e17 (HEAD -> qwen3_next, origin/qwen3_next)
  • Date: Thu Oct 23 21:10:58 2025 +0200

Running:

./llama-server \
      --hf-repo ilintar/Qwen3-Next-80B-A3B-Instruct-GGUF:IQ4_NL \
      --alias "local-qwen3-next80b" \
      --ctx-size 32768 \
      --host 0.0.0.0 --port 38000

I asked few computer-related queries and got good quality replies.
Used 1200-5000 tokens for each query.

Performance is slow, but I guess it's expected at this point:

build: 7260 (5edfe782) with cc (Ubuntu 13.3.0-6ubuntu2~24.04) 13.3.0 for x86_64-linux-gnu
system info: n_threads = 12, n_threads_batch = 12, total_threads = 12

system_info: n_threads = 12 (n_threads_batch = 12) / 12 | CPU : SSE3 = 1 | SSSE3 = 1 | AVX = 1 | AVX_VNNI = 1 | AVX2 = 1 |
 F16C = 1 | FMA = 1 | BMI2 = 1 | AVX512 = 1 | AVX512_VBMI = 1 | AVX512_VNNI = 1 | AVX512_BF16 = 1 | AMX_INT8 = 1 | 
 LLAMAFILE = 1 | OPENMP = 1 | REPACK = 1 | 
...

prompt eval time =  309599.38 ms /  2798 tokens (  110.65 ms per token,     9.04 tokens per second)
       eval time =  358576.36 ms /  1359 tokens (  263.85 ms per token,     3.79 tokens per second)
      total time =  668175.74 ms /  4157 tokens
SlavikF changed discussion title from Testing report to Testing report on Intel Xeon W5-3425

Sign up or log in to comment