Conversation
Use getattr with default None instead of direct attribute access, which raises AttributeError on NodeStorage objects without an x attribute. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Adds a Graph Transformer encoder that conforms to the same forward interface as HGT/SimpleHGN for use as a drop-in encoder in LinkPredictionGNN. Internally uses hetero_to_graph_transformer_input to convert HeteroData to sequences, processes through pre-norm transformer layers, and produces per-node embeddings via attention-weighted neighbor readout. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
gigl/transforms/graph_transformer.py
Outdated
| anchor_node_ids: Optional tensor of local node indices within anchor_node_type | ||
| to use as anchors. If None, uses first batch_size nodes. (default: None) | ||
| feature_dim: Output feature dimension. If None, inferred from data. | ||
| If provided and different from input, features are projected. |
There was a problem hiding this comment.
I do not see the projection logic in the implementation, is it still neede?
gigl/transforms/graph_transformer.py
Outdated
| return result | ||
|
|
||
|
|
||
| class HeteroToGraphTransformerInput: |
There was a problem hiding this comment.
nope, claude kept adding it back lol
gigl/transforms/graph_transformer.py
Outdated
| ) | ||
|
|
||
| pairwise_feature_sequences: Optional[Tensor] = None | ||
| if pairwise_pe_matrices: |
There was a problem hiding this comment.
nit: None handling is different with acnhor_features_sequences. Maybe make _lookup_pairwise_relative_features also accept None csr_matrices
gigl/transforms/graph_transformer.py
Outdated
| padding_value=padding_value, | ||
| ) | ||
|
|
||
| anchor_feature_sequences = _lookup_anchor_relative_features( |
There was a problem hiding this comment.
nit: maybe anchor_relative_features? It reads like anchor node features is read out separately from node_feature_sequences
gigl/transforms/graph_transformer.py
Outdated
| def _lookup_pairwise_relative_features( | ||
| node_index_sequences: Tensor, | ||
| valid_mask: Tensor, | ||
| csr_matrices: list[Tensor], |
There was a problem hiding this comment.
nit: Optional[list[Tensor]] to be consistent with _lookup_anchor_relative_features?
| ) -> Tensor: | ||
| """Gather node features into padded sequences using precomputed node indices.""" | ||
| batch_size, max_seq_len = node_index_sequences.shape | ||
| feature_dim = node_features.size(-1) |
There was a problem hiding this comment.
Does this work for graph w/o no node features?
There was a problem hiding this comment.
I think we can error out earlier on if there are no features. With no features GT doesn't work, but solutions like using PE or node ID embedding as features, those can be projected and added to x
| hidden_dim: int, | ||
| feedforward_dim: int, | ||
| dropout_rate: float = 0.0, | ||
| ) -> None: |
There was a problem hiding this comment.
Let's make activations configurable so that we can support XGLU family (SwiGLU, GeGLU) of activations?
There was a problem hiding this comment.
yup! added XGLU family support as well
| # to running statistics during training (which breaks autograd when | ||
| # model is called multiple times in the same forward-backward cycle) | ||
| self._norm_in = nn.LayerNorm(hidden_dim) | ||
| self._norm_out = nn.LayerNorm(hidden_dim) |
There was a problem hiding this comment.
I think it would be better to put the norms in a ResidualWrapper or inside the Transformer block itself, but not inside the FFN layers. Right now I think we are applying more than one norms by accident i.e. _ffn_norm(x) -> _norm_in(x) -> MLP -> _norm_out(x) -> residual add.
Also the _norm_out seems to be the double norm approach instead of standard pre-norm which we can test but not used as the default behavior.
There was a problem hiding this comment.
yeah there's already layerNorm in GraphTransformerEncoderLayer will remove it here
| ) | ||
|
|
||
| # Transformer encoder layers | ||
| feedforward_dim = 2 * hid_dim |
There was a problem hiding this comment.
Let's make the ffn_dim / model_dim ratio a parameter to tune? The default values can be 4 for regular activations, and 8/3 for XGLU variants
Scope of work done
Where is the documentation for this feature?: N/A
Did you add automated tests or write a test plan?
Updated Changelog.md? NO
Ready for code review?: NO