Skip to content

gfx1151 (Radeon 8060S / Strix Halo): Vulkan capability/perf mismatch vs RADV in llama.cpp workloads #420

@visorcraft

Description

@visorcraft

Problem description

On AMD Ryzen AI Max+ 395 (gfx1151, Radeon 8060S), AMDVLK path appears to expose/behave with a lower Vulkan capability profile for llama.cpp than RADV, resulting in major prompt-processing (pp) regression.

In equivalent llama.cpp Vulkan runs on the same machine:

  • AMD open-source driver path reports:
    • driverID = DRIVER_ID_AMD_OPEN_SOURCE
    • driverInfo = 2025.Q2.1 (LLPC)
    • shared memory: 32768
  • RADV path reports:
    • driverID = DRIVER_ID_MESA_RADV
    • driverName = radv
    • shared memory: 65536

This correlates with large prompt-processing deltas (same model/flags):

  • Qwen3-Coder-Next 80B-A3B Q4_K_M
    • AMD open-source path: ~378 pp512
    • RADV path: ~507–522 pp512

Token generation changes are smaller; prompt-processing is where the biggest hit appears.

This looks related to AMDVLK behavior already discussed in #413 (allocation limits), but this report is focused on capability/perf mismatch on gfx1151 in normal inference runs.

Hardware / software

  • Machine: GMKtec EVO-X2
  • CPU: AMD Ryzen AI Max+ 395
  • iGPU: Radeon 8060S (gfx1151, UMA)
  • RAM: 128 GB LPDDR5X unified memory
  • OS: Fedora 43
  • Kernel: 6.18+ and 6.19 tested in community reports
  • llama.cpp builds tested: 05fa625ea and nearby
  • Benchmark command shape:
    • llama-bench -m <model> --n-gpu-layers 99 -p 512 -n 128

Steps to reproduce

  1. On gfx1151 system, run Vulkan benchmark under default AMD open-source path:
/tmp/llama-vulkan/build/bin/llama-bench \
  -m /path/to/Qwen3-Coder-Next-Q4_K_M.gguf \
  --n-gpu-layers 99 -p 512 -n 128
  1. Capture reported Vulkan device line and results.

  2. Run same benchmark forcing RADV ICD and disabling loader layers:

VK_ICD_FILENAMES=/usr/share/vulkan/icd.d/radeon_icd.x86_64.json \
VK_LOADER_LAYERS_DISABLE=all \
/tmp/llama-vulkan/build/bin/llama-bench \
  -m /path/to/Qwen3-Coder-Next-Q4_K_M.gguf \
  --n-gpu-layers 99 -p 512 -n 128
  1. Compare device-reported shared memory and pp/tg.

Observed behavior

  • Default path shows 32 KB shared memory and significantly lower pp.
  • Forced RADV path shows 64 KB shared memory and much higher pp on same hardware.

Expected behavior

  • AMDVLK path on gfx1151 should expose capability/limits consistent with hardware expectations and avoid large perf cliffs vs RADV for the same Vulkan workload.

Proposed resolution / asks

  1. Verify gfx1151 Vulkan-reported limits/caps on AMDVLK (especially workgroup shared memory and related compute limits).
  2. Confirm whether current AMDVLK path for gfx1151 is expected to report 32 KB in this context.
  3. If this is a driver bug/regression, provide fix target version.
  4. If this is expected behavior, please document rationale and recommended mitigations for LLM compute workloads.

Related issues

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions