Skip to content

GT Encoder#537

Draft
yliu2-sc wants to merge 24 commits intomainfrom
gt_encoder
Draft

GT Encoder#537
yliu2-sc wants to merge 24 commits intomainfrom
gt_encoder

Conversation

@yliu2-sc
Copy link
Collaborator

@yliu2-sc yliu2-sc commented Mar 6, 2026

Scope of work done

Where is the documentation for this feature?: N/A

Did you add automated tests or write a test plan?

Updated Changelog.md? NO

Ready for code review?: NO

yliu2-sc and others added 14 commits February 23, 2026 12:54
Use getattr with default None instead of direct attribute access, which
raises AttributeError on NodeStorage objects without an x attribute.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Adds a Graph Transformer encoder that conforms to the same forward
interface as HGT/SimpleHGN for use as a drop-in encoder in
LinkPredictionGNN. Internally uses hetero_to_graph_transformer_input
to convert HeteroData to sequences, processes through pre-norm
transformer layers, and produces per-node embeddings via
attention-weighted neighbor readout.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
@yliu2-sc yliu2-sc changed the base branch from main to yliu2/heterodata_to_seq March 6, 2026 03:15
@yliu2-sc yliu2-sc requested a review from zfan3-sc March 6, 2026 03:15
@yliu2-sc yliu2-sc changed the base branch from yliu2/heterodata_to_seq to yliu2/positional_encoding_transform March 17, 2026 18:36
@yliu2-sc yliu2-sc mentioned this pull request Mar 17, 2026
anchor_node_ids: Optional tensor of local node indices within anchor_node_type
to use as anchors. If None, uses first batch_size nodes. (default: None)
feature_dim: Output feature dimension. If None, inferred from data.
If provided and different from input, features are projected.
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I do not see the projection logic in the implementation, is it still neede?

return result


class HeteroToGraphTransformerInput:
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is it still needed?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nope, claude kept adding it back lol

)

pairwise_feature_sequences: Optional[Tensor] = None
if pairwise_pe_matrices:
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: None handling is different with acnhor_features_sequences. Maybe make _lookup_pairwise_relative_features also accept None csr_matrices

padding_value=padding_value,
)

anchor_feature_sequences = _lookup_anchor_relative_features(
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: maybe anchor_relative_features? It reads like anchor node features is read out separately from node_feature_sequences

def _lookup_pairwise_relative_features(
node_index_sequences: Tensor,
valid_mask: Tensor,
csr_matrices: list[Tensor],
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: Optional[list[Tensor]] to be consistent with _lookup_anchor_relative_features?

) -> Tensor:
"""Gather node features into padded sequences using precomputed node indices."""
batch_size, max_seq_len = node_index_sequences.shape
feature_dim = node_features.size(-1)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Does this work for graph w/o no node features?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we can error out earlier on if there are no features. With no features GT doesn't work, but solutions like using PE or node ID embedding as features, those can be projected and added to x

hidden_dim: int,
feedforward_dim: int,
dropout_rate: float = 0.0,
) -> None:
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let's make activations configurable so that we can support XGLU family (SwiGLU, GeGLU) of activations?

Copy link
Collaborator Author

@yliu2-sc yliu2-sc Mar 20, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yup! added XGLU family support as well

# to running statistics during training (which breaks autograd when
# model is called multiple times in the same forward-backward cycle)
self._norm_in = nn.LayerNorm(hidden_dim)
self._norm_out = nn.LayerNorm(hidden_dim)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think it would be better to put the norms in a ResidualWrapper or inside the Transformer block itself, but not inside the FFN layers. Right now I think we are applying more than one norms by accident i.e. _ffn_norm(x) -> _norm_in(x) -> MLP -> _norm_out(x) -> residual add.
Also the _norm_out seems to be the double norm approach instead of standard pre-norm which we can test but not used as the default behavior.

Copy link
Collaborator Author

@yliu2-sc yliu2-sc Mar 20, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yeah there's already layerNorm in GraphTransformerEncoderLayer will remove it here

)

# Transformer encoder layers
feedforward_dim = 2 * hid_dim
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let's make the ffn_dim / model_dim ratio a parameter to tune? The default values can be 4 for regular activations, and 8/3 for XGLU variants

Base automatically changed from yliu2/positional_encoding_transform to main March 20, 2026 01:16
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants