Testing report on Intel Xeon W5-3425
#2
by
SlavikF
- opened
System:
- Intel Xeon W5-3425, 12 cores
- DDR5-4800 RAM (8 channels * 64GB), mlc reports ~190GB/s
- Ubuntu 24
llama.cpp:
- commit 5edfe782a9d7dc1b717f9d132c42404c7a517e17 (HEAD -> qwen3_next, origin/qwen3_next)
- Date: Thu Oct 23 21:10:58 2025 +0200
Running:
./llama-server \
--hf-repo ilintar/Qwen3-Next-80B-A3B-Instruct-GGUF:IQ4_NL \
--alias "local-qwen3-next80b" \
--ctx-size 32768 \
--host 0.0.0.0 --port 38000
I asked few computer-related queries and got good quality replies.
Used 1200-5000 tokens for each query.
Performance is slow, but I guess it's expected at this point:
build: 7260 (5edfe782) with cc (Ubuntu 13.3.0-6ubuntu2~24.04) 13.3.0 for x86_64-linux-gnu
system info: n_threads = 12, n_threads_batch = 12, total_threads = 12
system_info: n_threads = 12 (n_threads_batch = 12) / 12 | CPU : SSE3 = 1 | SSSE3 = 1 | AVX = 1 | AVX_VNNI = 1 | AVX2 = 1 |
F16C = 1 | FMA = 1 | BMI2 = 1 | AVX512 = 1 | AVX512_VBMI = 1 | AVX512_VNNI = 1 | AVX512_BF16 = 1 | AMX_INT8 = 1 |
LLAMAFILE = 1 | OPENMP = 1 | REPACK = 1 |
...
prompt eval time = 309599.38 ms / 2798 tokens ( 110.65 ms per token, 9.04 tokens per second)
eval time = 358576.36 ms / 1359 tokens ( 263.85 ms per token, 3.79 tokens per second)
total time = 668175.74 ms / 4157 tokens
SlavikF
changed discussion title from
Testing report
to Testing report on Intel Xeon W5-3425