NVIDIA / TransformerEngine Public

Notifications You must be signed in to change notification settings
Fork 651
Star 3.2k

Code
Issues 239
Pull requests 128
Discussions
Actions
Security
Insights

Additional navigation options

Code
Issues
Pull requests
Discussions
Actions
Security
Insights

Pull requests: NVIDIA/TransformerEngine

Labels 69 Milestones 0

New pull request New

128 Open 1,896 Closed

Author

Filter by author

Uh oh!

There was an error while loading. Please reload this page.

Label

Filter by label

Uh oh!

There was an error while loading. Please reload this page.

Use alt + click/return to exclude labels

or ⇧ + click/return for logical OR

Projects

Filter by project

Uh oh!

There was an error while loading. Please reload this page.

Milestones

Filter by milestone

Uh oh!

There was an error while loading. Please reload this page.

Reviews

Filter by reviews

No reviews Review required Approved review Changes requested

Assignee

Filter by who’s assigned

Assigned to nobody

Uh oh!

There was an error while loading. Please reload this page.

Sort

Sort by

Newest Oldest Most commented Least commented Recently updated Least recently updated Best match

Most reactions

Pull requests list

Add MXFP8 attention

#2719 opened Mar 1, 2026 by cyanguwa • Draft

13 tasks

pass params_dtype to qk_norm creation

#2718 opened Feb 28, 2026 by pstjohn

Loading…

Hongbinl/offload activation cuda graph mxfp8 offload fix

#2716 opened Feb 27, 2026 by lhb8125 • Draft

13 tasks

[PyTorch] Remove is_first_microbatch setting after cudagraph warmup

#2715 opened Feb 27, 2026 by buptzyb

Loading…

13 tasks

[JAX] CGEMM with Shardy

#2714 opened Feb 27, 2026 by phu0ngng

Loading…

8 of 13 tasks

Add DCP compatibility for FSDP2-TP sharding in TransformerEngine.

#2713 opened Feb 26, 2026 by cspades • Draft

13 tasks

Enable dequantization from MXFP8 tensor with only columnwise data

#2712 opened Feb 26, 2026 by ptrendx

Loading…

13 tasks

[JAX] Support calling MOE router kernels from JAX side

#2711 opened Feb 26, 2026 by tdophung

Loading…

1 of 13 tasks

[Common][PyTorch] Add z_loss_weight and log_sum_exp output to parallel_cross_entropy

#2707 opened Feb 26, 2026 by bassoy • Draft

8 tasks done

[Draft] Newton-Schulz via cuSOLVERMp

#2706 opened Feb 25, 2026 by vcherepanov-nv

Loading…

6 of 13 tasks

[All] Added better error messages

#2705 opened Feb 25, 2026 by ptrendx

Loading…

Fix Flash Attention 3 API compatibility for window size parameters 2.14.0

#2704 opened Feb 25, 2026 by jhvmhg

Loading…

3 of 13 tasks

[JAX] Deprecate GSPMD: remove infer_sharding_from_operands and GSPMD tests

#2702 opened Feb 24, 2026 by phu0ngng

Loading…

8 of 13 tasks

[Draft][PyTorch] torch.compile support for TE Linear

#2701 opened Feb 24, 2026 by pggPL • Draft

13 tasks

Add fused_adam, quantized_model_init, and fsdp2 example

#2698 opened Feb 22, 2026 by pstjohn

Loading…

[PyTorch] Zero-initialize learnable softmax_offset in DotProductAttention

#2694 opened Feb 20, 2026 by fjosw

Loading…

7 of 13 tasks

Enable sm120 support for fused attn if cuDNN is 9.18.1+

#2693 opened Feb 20, 2026 by KshitijLakhani • Draft

13 tasks

[JAX] Fix get_seqlens_and_offsets() to accept vmapped seg ids and non vmapped seg offsets 2.14.0

#2692 opened Feb 19, 2026 by KshitijLakhani

Loading…

7 of 13 tasks

NVFP4 primary weight support

#2691 opened Feb 19, 2026 by WanZzzzzz

Loading…

10 of 13 tasks

[PyTorch] Error out if constructing LayerNormLinear with row tensor parallelism bug

Something isn't working

#2688 opened Feb 17, 2026 by timmoon10

Loading…

6 of 13 tasks

[PyTorch] torch.compile support for permutation functions

#2686 opened Feb 17, 2026 by pggPL

Loading…

9 of 13 tasks

[JAX] Integrate BF16 Grouped GEMM with on-device group sizes

#2680 opened Feb 13, 2026 by jberchtold-nvidia

Loading…

8 of 13 tasks

[PyTorch][Fused Attn] Add support for cuDNN to return Softmax Stats always and Max when return_max_logit=True

#2677 opened Feb 12, 2026 by sudhakarsingh27

Loading…

7 of 13 tasks

[PyTorch] Add dtype information to QuantizedTensorStorage class

#2676 opened Feb 12, 2026 by ptrendx

Loading…

1 of 13 tasks

[Common] MOE Split dBias cpu_overhead enhancement

New feature or request

MoE

#2674 opened Feb 11, 2026 by Oleg-Goncharov

Loading…

8 of 13 tasks

Previous 1 2 3 4 5 6 Next

Previous Next

ProTip! Mix and match filters to narrow down what you’re looking for.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Uh oh!