Add SobelFilter to SVE microbenchmark#5040

Open

ylpoonlg wants to merge 3 commits intodotnet:mainfrom

ylpoonlg:github-sobelfilter

Contributor

ylpoonlg commented Nov 13, 2025

Performance Results

Run on Neoverse-V2

Method	Size	Mean	Error	StdDev	Median	Min	Max	Allocated
Scalar	9	106.89 ns	0.027 ns	0.023 ns	106.89 ns	106.86 ns	106.94 ns	-
Vector128SobelFilter	9	47.21 ns	0.007 ns	0.006 ns	47.21 ns	47.20 ns	47.22 ns	-
SveSobelFilter	9	28.89 ns	0.134 ns	0.125 ns	28.81 ns	28.77 ns	29.07 ns	-
Scalar	64	7,125.25 ns	2.823 ns	2.503 ns	7,124.19 ns	7,122.83 ns	7,130.43 ns	-
Vector128SobelFilter	64	1,255.35 ns	0.913 ns	0.854 ns	1,255.21 ns	1,254.21 ns	1,256.87 ns	-
SveSobelFilter	64	1,339.70 ns	0.280 ns	0.219 ns	1,339.65 ns	1,339.41 ns	1,340.14 ns	-
Scalar	527	507,148.17 ns	324.567 ns	287.720 ns	507,050.40 ns	506,733.19 ns	507,636.12 ns	-
Vector128SobelFilter	527	73,998.18 ns	166.540 ns	147.634 ns	73,971.34 ns	73,795.42 ns	74,256.96 ns	-
SveSobelFilter	527	100,770.21 ns	1,836.759 ns	1,628.239 ns	100,470.79 ns	96,333.15 ns	103,294.79 ns	-

cc @dotnet/arm64-contrib @SwapnilGaikwad @LoopedBard3

ylpoonlg added 3 commits

November 13, 2025 14:27


          Add SobelFilter to SVE microbenchmark

2af6711


          Merge branch 'main' into github-sobelfilter

2a23319


          Add SVE Category

98c2926

LoopedBard3 approved these changes

View reviewed changes

Member

LoopedBard3 left a comment

Looks good, thanks for the benchmarks!

LoopedBard3 requested a review from Copilot

March 17, 2026 21:58

Copilot started reviewing on behalf of LoopedBard3

March 17, 2026 21:59

Copilot AI reviewed

View reviewed changes

Contributor

Copilot AI left a comment

Pull request overview

Adds a new SVE-focused microbenchmark for a 2D Sobel filter convolution, enabling comparison of scalar vs AdvSimd (Vector128) vs SVE implementations on Arm64/SVE-capable systems.

Changes:

Introduces SobelFilter microbenchmark with Scalar, Vector128SobelFilter, and SveSobelFilter benchmarks.
Adds setup/verification scaffolding and Sobel kernel initialization for repeatable runs.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

You can also share your feedback on Copilot code review. Take the survey.

src/benchmarks/micro/sve/SobelFilter.cs

Comment on lines +196 to +200

+                              Vector<float> resVec;
+                              // Load coefficients of the filter into vectors.
+                              Vector<float> kxVec = Sve.LoadVector((Vector<float>)Sve.CreateWhileLessThanMask32Bit(0, 3), kx);
+                              Vector<float> kyVec = Sve.LoadVector((Vector<float>)Sve.CreateWhileLessThanMask32Bit(0, 3), ky);
+                              for (int j = 0; j < img_size; j++)

src/benchmarks/micro/sve/SobelFilter.cs

Comment on lines +206 to +213

+                                  for (int i = 0; i < out_size; i += cntw)
+                                  {
+                                      Vector<float> pRow = (Vector<float>)Sve.CreateWhileLessThanMask32Bit(i, out_size);
+                                      // Load input elements from the next 3 columns.
+                                      Vector<float> col0 = Sve.LoadVector(pRow, in_ptr + i);
+                                      Vector<float> col1 = Sve.LoadVector(pRow, in_ptr + i + 1);
+                                      Vector<float> col2 = Sve.LoadVector(pRow, in_ptr + i + 2);

src/benchmarks/micro/sve/SobelFilter.cs

Comment on lines +229 to +236

+                                  for (int i = 0; i < out_size; i += cntw)
+                                  {
+                                      Vector<float> pRow = (Vector<float>)Sve.CreateWhileLessThanMask32Bit(i, out_size);
+                                      // Load input elements from the next 3 rows.
+                                      Vector<float> row0 = Sve.LoadVector(pRow, in_ptr + i);
+                                      Vector<float> row1 = Sve.LoadVector(pRow, in_ptr + i + out_size);
+                                      Vector<float> row2 = Sve.LoadVector(pRow, in_ptr + i + 2 * out_size);

src/benchmarks/micro/sve/SobelFilter.cs

Comment on lines +69 to +100

+                          int img_size = Size;
+                          // The output image size is 2-pixel smaller in each direction.
+                          int out_size = img_size - 2;
+                          fixed (float* input = _source, temp = _temp, output = _result)
+                          fixed (float* kx = _kx, ky = _ky)
+                          {
+                              // Convolve the horizontal component first.
+                              // The result is save to the temp array.
+                              for (int j = 0; j < img_size; j++)
+                              {
+                                  for (int i = 0; i < out_size; i++)
+                                  {
+                                      float res = 0.0F;
+                                      for (int k = 0; k < 3; k++)
+                                      {
+                                          res += kx[k] * input[j * img_size + i + k];
+                                      }
+                                      temp[j * out_size + i] = res;
+                                  }
+                              }
+                              // Then convolve the vertical component.
+                              // Using the temp array as input.
+                              for (int j = 0; j < out_size; j++)
+                              {
+                                  for (int i = 0; i < out_size; i++)
+                                  {
+                                      float res = 0.0F;
+                                      for (int k = 0; k < 3; k++)
+                                      {
+                                          res += ky[k] * temp[(j + k) * out_size + i];
+                                      }
+                                      output[j * out_size + i] = res;

src/benchmarks/micro/sve/SobelFilter.cs

+                          fixed (float* kx = _kx, ky = _ky)
+                          {
+                              // Convolve the horizontal component first.
+                              // The result is save to the temp array.

LoopedBard3 mentioned this pull request

Add OddEvenSort to SVE microbenchmark #5046

Open

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet