-
Notifications
You must be signed in to change notification settings - Fork 172
Open
Description
Problem description
On AMD Ryzen AI Max+ 395 (gfx1151, Radeon 8060S), AMDVLK path appears to expose/behave with a lower Vulkan capability profile for llama.cpp than RADV, resulting in major prompt-processing (pp) regression.
In equivalent llama.cpp Vulkan runs on the same machine:
- AMD open-source driver path reports:
driverID = DRIVER_ID_AMD_OPEN_SOURCEdriverInfo = 2025.Q2.1 (LLPC)shared memory: 32768
- RADV path reports:
driverID = DRIVER_ID_MESA_RADVdriverName = radvshared memory: 65536
This correlates with large prompt-processing deltas (same model/flags):
- Qwen3-Coder-Next 80B-A3B Q4_K_M
- AMD open-source path: ~378 pp512
- RADV path: ~507–522 pp512
Token generation changes are smaller; prompt-processing is where the biggest hit appears.
This looks related to AMDVLK behavior already discussed in #413 (allocation limits), but this report is focused on capability/perf mismatch on gfx1151 in normal inference runs.
Hardware / software
- Machine: GMKtec EVO-X2
- CPU: AMD Ryzen AI Max+ 395
- iGPU: Radeon 8060S (gfx1151, UMA)
- RAM: 128 GB LPDDR5X unified memory
- OS: Fedora 43
- Kernel: 6.18+ and 6.19 tested in community reports
- llama.cpp builds tested:
05fa625eaand nearby - Benchmark command shape:
llama-bench -m <model> --n-gpu-layers 99 -p 512 -n 128
Steps to reproduce
- On gfx1151 system, run Vulkan benchmark under default AMD open-source path:
/tmp/llama-vulkan/build/bin/llama-bench \
-m /path/to/Qwen3-Coder-Next-Q4_K_M.gguf \
--n-gpu-layers 99 -p 512 -n 128-
Capture reported Vulkan device line and results.
-
Run same benchmark forcing RADV ICD and disabling loader layers:
VK_ICD_FILENAMES=/usr/share/vulkan/icd.d/radeon_icd.x86_64.json \
VK_LOADER_LAYERS_DISABLE=all \
/tmp/llama-vulkan/build/bin/llama-bench \
-m /path/to/Qwen3-Coder-Next-Q4_K_M.gguf \
--n-gpu-layers 99 -p 512 -n 128- Compare device-reported shared memory and pp/tg.
Observed behavior
- Default path shows 32 KB shared memory and significantly lower pp.
- Forced RADV path shows 64 KB shared memory and much higher pp on same hardware.
Expected behavior
- AMDVLK path on gfx1151 should expose capability/limits consistent with hardware expectations and avoid large perf cliffs vs RADV for the same Vulkan workload.
Proposed resolution / asks
- Verify gfx1151 Vulkan-reported limits/caps on AMDVLK (especially workgroup shared memory and related compute limits).
- Confirm whether current AMDVLK path for gfx1151 is expected to report 32 KB in this context.
- If this is a driver bug/regression, provide fix target version.
- If this is expected behavior, please document rationale and recommended mitigations for LLM compute workloads.
Related issues
- Related AMDVLK allocation-limit issue: Vulkan/AMDVLK: Device memory allocation fails when single compute buffer > ~2 GiB (same model works on RADV) #413
- Related llama.cpp thread: Vulkan/AMDVLK: Device memory allocation fails when single compute buffer > ~2 GiB (same model works on RADV) ggml-org/llama.cpp#15054
Reactions are currently unavailable
Metadata
Metadata
Assignees
Labels
No labels