Skip to content

feat: Add apr gpu status and --wait-gpu flag (GH-152)#398

Open
noahgift wants to merge 2 commits intomainfrom
gpu-share-cli
Open

feat: Add apr gpu status and --wait-gpu flag (GH-152)#398
noahgift wants to merge 2 commits intomainfrom
gpu-share-cli

Conversation

@noahgift
Copy link
Contributor

@noahgift noahgift commented Mar 4, 2026

Summary

  • Add apr gpu command for GPU VRAM status display (text + JSON)
  • Add --wait-gpu <SECS> flag to apr finetune for VRAM polling queue
  • Wire wait_gpu parameter through dispatch chain

Test plan

  • apr gpu shows GPU UUID, capacity, reservations
  • apr gpu --json returns structured JSON
  • --wait-gpu 0 (default) skips waiting
  • --wait-gpu 60 polls ledger for 60s before training

🤖 Generated with Claude Code

noahgift and others added 2 commits March 4, 2026 12:26
APR CPU was 23x slower than llama.cpp because it used the F32 AprTransformer
instead of the fused Q4K engine. Now routes through OwnedQuantizedModel
(same path as GGUF/SafeTensors), achieving parity with GGUF CPU (~18 tok/s).
Wire --trace flag through to AppState.inference_trace for all serve paths.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
- Add `apr gpu` command: displays GPU UUID, VRAM capacity, active
  reservations, and available budget from the entrenar VRAM ledger
- Add `apr gpu --json` for machine-readable output
- Add `--wait-gpu <SECS>` flag to `apr finetune`: polls VRAM ledger
  until sufficient budget is available (GPU-SHARE-003)
- Wire wait_gpu parameter through dispatch → finetune::run()

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant