GT Encoder#537

Draft

yliu2-sc wants to merge 24 commits intomainfrom

Collaborator

yliu2-sc commented Mar 6, 2026

Scope of work done

Where is the documentation for this feature?: N/A

Did you add automated tests or write a test plan?

Updated Changelog.md? NO

Ready for code review?: NO

yliu2-sc and others added 14 commits

February 23, 2026 12:54


          init

4297f49


          Fix AttributeError in add_node_attr when node type has no x attribute

a6f758d

Use getattr with default None instead of direct attribute access, which
raises AttributeError on NodeStorage objects without an x attribute.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>


          updates

dc45b7e


          updates

cc90c6f


          rm claude

d4e97a1


          update to sparse operations

8d366a3


          update hop distance

00667ee


          simplify

c34d0d4


          transform

d34e700


          todo

88c9ca6


          some optimization

26688f6


          optim 2

b29f8d6


          hop dist memory optim


          Add GraphTransformerEncoder adapted from RelGT's LocalModule

e017e6c

Adds a Graph Transformer encoder that conforms to the same forward
interface as HGT/SimpleHGN for use as a drop-in encoder in
LinkPredictionGNN. Internally uses hetero_to_graph_transformer_input
to convert HeteroData to sequences, processes through pre-norm
transformer layers, and produces per-node embeddings via
attention-weighted neighbor readout.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

yliu2-sc changed the base branch from main to yliu2/heterodata_to_seq

March 6, 2026 03:15

yliu2-sc requested a review from zfan3-sc

March 6, 2026 03:15

yliu2-sc added 7 commits

March 5, 2026 22:11


          comments

5cb0450


          update with anchor based hop distance

7b0cb07


          Merge branch 'yliu2/heterodata_to_seq' of github.com:Snapchat/GiGL in…

5d3c434

…to gt_encoder


          update sequence to only seed

506fe7d


          update

7a83200


          update anchor computation

57e8265


          add pe, attention bias

8e0e219

yliu2-sc changed the base branch from yliu2/heterodata_to_seq to yliu2/positional_encoding_transform

March 17, 2026 18:36

yliu2-sc mentioned this pull request

Heterodata to Seq #518

Closed


          rename variables, add comments, format

cd2e97d

zfan3-sc reviewed

View reviewed changes

gigl/transforms/graph_transformer.py Outdated

+                      anchor_node_ids: Optional tensor of local node indices within anchor_node_type
+                          to use as anchors. If None, uses first batch_size nodes. (default: None)
+                      feature_dim: Output feature dimension. If None, inferred from data.
+                          If provided and different from input, features are projected.

Collaborator

zfan3-sc Mar 18, 2026

I do not see the projection logic in the implementation, is it still neede?

gigl/transforms/graph_transformer.py Outdated

		return result


		class HeteroToGraphTransformerInput:

Collaborator

zfan3-sc Mar 18, 2026

Is it still needed?

Collaborator Author

yliu2-sc Mar 20, 2026

nope, claude kept adding it back lol

gigl/transforms/graph_transformer.py Outdated

+                  )
+                  pairwise_feature_sequences: Optional[Tensor] = None
+                  if pairwise_pe_matrices:

Collaborator

zfan3-sc Mar 18, 2026

nit: None handling is different with acnhor_features_sequences. Maybe make _lookup_pairwise_relative_features also accept None csr_matrices

gigl/transforms/graph_transformer.py Outdated

+                      padding_value=padding_value,
+                  )
+                  anchor_feature_sequences = _lookup_anchor_relative_features(

Collaborator

zfan3-sc Mar 18, 2026

nit: maybe anchor_relative_features? It reads like anchor node features is read out separately from node_feature_sequences

gigl/transforms/graph_transformer.py Outdated

+              def _lookup_pairwise_relative_features(
+                  node_index_sequences: Tensor,
+                  valid_mask: Tensor,
+                  csr_matrices: list[Tensor],

Collaborator

zfan3-sc Mar 18, 2026

nit: Optional[list[Tensor]] to be consistent with _lookup_anchor_relative_features?

gigl/transforms/graph_transformer.py

+              ) -> Tensor:
+                  """Gather node features into padded sequences using precomputed node indices."""
+                  batch_size, max_seq_len = node_index_sequences.shape
+                  feature_dim = node_features.size(-1)

Collaborator

zfan3-sc Mar 18, 2026

Does this work for graph w/o no node features?

Collaborator Author

yliu2-sc Mar 20, 2026

I think we can error out earlier on if there are no features. With no features GT doesn't work, but solutions like using PE or node ID embedding as features, those can be projected and added to x

gigl/src/common/models/graph_transformer/graph_transformer.py

+                      hidden_dim: int,
+                      feedforward_dim: int,
+                      dropout_rate: float = 0.0,
+                  ) -> None:

Collaborator

zfan3-sc Mar 18, 2026

Let's make activations configurable so that we can support XGLU family (SwiGLU, GeGLU) of activations?

Collaborator Author

yliu2-sc Mar 20, 2026 •

edited

Loading

yup! added XGLU family support as well

gigl/src/common/models/graph_transformer/graph_transformer.py Outdated

+                      # to running statistics during training (which breaks autograd when
+                      # model is called multiple times in the same forward-backward cycle)
+                      self._norm_in = nn.LayerNorm(hidden_dim)
+                      self._norm_out = nn.LayerNorm(hidden_dim)

Collaborator

zfan3-sc Mar 18, 2026

I think it would be better to put the norms in a ResidualWrapper or inside the Transformer block itself, but not inside the FFN layers. Right now I think we are applying more than one norms by accident i.e. _ffn_norm(x) -> _norm_in(x) -> MLP -> _norm_out(x) -> residual add.
Also the _norm_out seems to be the double norm approach instead of standard pre-norm which we can test but not used as the default behavior.

Collaborator Author

yliu2-sc Mar 20, 2026 •

edited

Loading

yeah there's already layerNorm in GraphTransformerEncoderLayer will remove it here

gigl/src/common/models/graph_transformer/graph_transformer.py Outdated

+                          )
+                      # Transformer encoder layers
+                      feedforward_dim = 2 * hid_dim

Collaborator

zfan3-sc Mar 18, 2026

Let's make the ffn_dim / model_dim ratio a parameter to tune? The default values can be 4 for regular activations, and 8/3 for XGLU variants

Base automatically changed from yliu2/positional_encoding_transform to main

March 20, 2026 01:16

yliu2-sc added 2 commits

March 19, 2026 20:55


          merge main

21d2ba0


          address comments

81c5fea

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Reviewers

zfan3-sc zfan3-sc left review comments

svij-sc Awaiting requested review from svij-sc svij-sc will be requested when the pull request is marked ready for review svij-sc is a code owner

nshah-sc Awaiting requested review from nshah-sc nshah-sc will be requested when the pull request is marked ready for review nshah-sc is a code owner

kmontemayor2-sc Awaiting requested review from kmontemayor2-sc kmontemayor2-sc will be requested when the pull request is marked ready for review kmontemayor2-sc is a code owner

mkolodner-sc Awaiting requested review from mkolodner-sc mkolodner-sc will be requested when the pull request is marked ready for review mkolodner-sc is a code owner

xgao4-sc Awaiting requested review from xgao4-sc xgao4-sc will be requested when the pull request is marked ready for review xgao4-sc is a code owner

At least 2 approving reviews are required to merge this pull request.

Labels

None yet