[ET-VK][matmul] Re-implement fp32/fp16 matmul and linear with tiled compute and blocked weight packing by pytorchbot · Pull Request #18183 · pytorch/executorch

pytorchbot · 2026-03-14T08:49:29Z

This PR was created by the merge bot to help merge the original PR into the main branch.
ghstack PR number: #18171 by @SS-JIA
^ Please use this as the source of truth for the PR details, comments, and reviews
ghstack PR base: https://github.com/pytorch/executorch/tree/gh/SS-JIA/487/base
ghstack PR head: https://github.com/pytorch/executorch/tree/gh/SS-JIA/487/head
Merge bot PR base: https://github.com/pytorch/executorch/tree/main
Merge bot PR head: https://github.com/pytorch/executorch/tree/gh/SS-JIA/487/orig
Differential Revision: D96488384
@diff-train-skip-merge

…ompute and blocked weight packing Pull Request resolved: #18171 Replace all existing matmul/linear operator implementations with new ones built from the ground up using a tiled compute approach. Delete all legacy implementations (MatMulLegacy.cpp, LinearLegacy.cpp, addmm_optimized.glsl, addmm_naive_*.glsl). New matmul (mm/bmm/addmm): - Single matmul.glsl shader handles mm, bmm, and addmm using FPInputTile, FPWeightTile, FPOutTile infrastructure from SDPA - Adaptive tile size selection (TILE_M=4/2/1) based on GPU occupancy - When mat2 is a constant tensor, automatically routes through the linear path for blocked weight packing New linear: - Custom 4OC×4IC blocked weight prepacking via pack_fp_linear_weight.glsl for optimal cache line utilization during tiled matmul - Supports both transposed [N,K] and non-transposed [K,N] weights with batch dimension support - Separate texture2d weight storage with automatic buffer fallback for large dimensions Performance on Adreno 750 (fp16, vs legacy): - Linear [4096,1024]x[256,1024]: 1.33x faster (texture) - Linear [4096,64]x[128,64]: 2.67x faster (texture) - BMM [1,4096,256]x[1,256,1024]: 1.63x faster (texture) ghstack-source-id: 352051371 @exported-using-ghexport Differential Revision: [D96488384](https://our.internmc.facebook.com/intern/diff/D96488384/)

pytorch-bot · 2026-03-14T08:49:32Z

🔗 Helpful Links

🧪 See artifacts and rendered test results at hud.pytorch.org/pr/pytorch/executorch/18183

📄 Preview Python docs built from this PR

Note: Links to docs will display an error until the docs builds have been completed.

❌ 1 New Failure

As of commit ef2d901 with merge base cc27e6b ():

NEW FAILURE - The following job has failed:

pull / unittest / macos / macos-job (gh)
export/tests/test_target_recipes.py::TestTargetRecipes::test_linear_model

This comment was automatically generated by Dr. CI and updates every 15 minutes.

github-actions · 2026-03-14T08:50:09Z

This PR needs a `release notes:` label

If your change should be included in the release notes (i.e. would users of this library care about this change?), please use a label starting with release notes:. This helps us keep track and include your important work in the next release notes.

To add a label, you can comment to pytorchbot, for example
@pytorchbot label "release notes: none"

For more information, see
https://github.com/pytorch/pytorch/wiki/PyTorch-AutoLabel-Bot#why-categorize-for-release-notes-and-how-does-it-work.

pytorchbot requested a review from SS-JIA as a code owner March 14, 2026 08:49

meta-cla bot added the CLA Signed This label is managed by the Facebook bot. Authors need to sign the CLA before a PR can be reviewed. label Mar 14, 2026

SS-JIA approved these changes Mar 14, 2026

View reviewed changes

SS-JIA merged commit 89b938a into main Mar 14, 2026
227 of 230 checks passed

SS-JIA deleted the gh/SS-JIA/487/orig branch March 14, 2026 14:19

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[ET-VK][matmul] Re-implement fp32/fp16 matmul and linear with tiled compute and blocked weight packing#18183

[ET-VK][matmul] Re-implement fp32/fp16 matmul and linear with tiled compute and blocked weight packing#18183
SS-JIA merged 1 commit intomainfrom
gh/SS-JIA/487/orig

pytorchbot commented Mar 14, 2026

Uh oh!

pytorch-bot bot commented Mar 14, 2026 •

edited

Loading

Uh oh!

github-actions bot commented Mar 14, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

pytorchbot commented Mar 14, 2026

Uh oh!

pytorch-bot bot commented Mar 14, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

🔗 Helpful Links

🧪 See artifacts and rendered test results at hud.pytorch.org/pr/pytorch/executorch/18183

❌ 1 New Failure

Uh oh!

github-actions bot commented Mar 14, 2026

This PR needs a release notes: label

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

pytorch-bot bot commented Mar 14, 2026 •

edited

Loading

This PR needs a `release notes:` label