Skip to content

[Chapter 8] Correction regarding the relationship between GQA and MQA (Page 51) #8

@Sameta-cani

Description

@Sameta-cani

Location

  • File: chapters/nlp-book-chapter8.pdf
  • Page: 51
  • Section: 8.3.4 Sharing across Heads and Layers

Problem Description
I would like to report a potential error in the description of Grouped-Query Attention (GQA).
In the text regarding the parameter $n_g$ (number of groups), the book states:

"By contrast, when $n_g = 1$, it becomes the GQA model."

Reasoning
If $n_g$ represents the number of groups:

  1. $n_g = 1$ implies that all query heads share a single Key-Value pair. This is the exact definition of MQA (Multi-Query Attention).
  2. As proposed in the original GQA paper (Ainslie et al.), GQA is an interpolation between MHA and MQA.
    • Limit 1 ($n_g = 1$): MQA
    • Limit 2 ($n_g = H$): MHA
    • Intermediate: GQA

Therefore, stating that $n_g=1$ becomes the "GQA model" is confusing, as GQA usually refers to the general case or the intermediate state, whereas the specific limit of 1 is widely recognized as MQA.

Suggested Fix
I suggest changing the sentence to:

"By contrast, when $n_g = 1$, it becomes the MQA model."

Thank you for the great resources.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions