-
Notifications
You must be signed in to change notification settings - Fork 104
Open
Description
Location
- File:
chapters/nlp-book-chapter8.pdf - Page: 51
- Section: 8.3.4 Sharing across Heads and Layers
Problem Description
I would like to report a potential error in the description of Grouped-Query Attention (GQA).
In the text regarding the parameter
"By contrast, when
$n_g = 1$ , it becomes the GQA model."
Reasoning
If
-
$n_g = 1$ implies that all query heads share a single Key-Value pair. This is the exact definition of MQA (Multi-Query Attention). - As proposed in the original GQA paper (Ainslie et al.), GQA is an interpolation between MHA and MQA.
- Limit 1 (
$n_g = 1$ ): MQA - Limit 2 (
$n_g = H$ ): MHA - Intermediate: GQA
- Limit 1 (
Therefore, stating that
Suggested Fix
I suggest changing the sentence to:
"By contrast, when
$n_g = 1$ , it becomes the MQA model."
Thank you for the great resources.
Reactions are currently unavailable
Metadata
Metadata
Assignees
Labels
No labels