Skip to content

Conversation

@theirix
Copy link
Contributor

@theirix theirix commented Jan 29, 2026

Which issue does this PR close?

Rationale for this change

Similar to issue #19749 and the optimisation of left in #19980, it's worth doing the same for right

What changes are included in this PR?

  • Improve efficiency of the function by making fewer memory allocations and going directly to bytes, based on char boundaries

  • Provide a specialisation for StringView with buffer zero-copy

  • Use arrow_array::buffer::make_view for low-level view manipulation (we still need to know about a magic constant 12 for a buffer layout)

  • Benchmark - up to 90% performance improvement

right size=1024/string_array positive n/1024
                        time:   [24.286 µs 24.658 µs 25.087 µs]
                        change: [−86.881% −86.662% −86.424%] (p = 0.00 < 0.05)
                        Performance has improved.
right size=1024/string_array negative n/1024
                        time:   [29.996 µs 30.737 µs 31.511 µs]
                        change: [−89.442% −89.229% −89.003%] (p = 0.00 < 0.05)
                        Performance has improved.
Found 1 outliers among 100 measurements (1.00%)
  1 (1.00%) high mild

right size=4096/string_array positive n/4096
                        time:   [105.58 µs 109.39 µs 113.51 µs]
                        change: [−86.119% −85.788% −85.497%] (p = 0.00 < 0.05)
                        Performance has improved.
Found 9 outliers among 100 measurements (9.00%)
  6 (6.00%) high mild
  3 (3.00%) high severe
right size=4096/string_array negative n/4096
                        time:   [136.48 µs 138.34 µs 140.36 µs]
                        change: [−88.007% −87.848% −87.692%] (p = 0.00 < 0.05)
                        Performance has improved.
Found 4 outliers among 100 measurements (4.00%)
  4 (4.00%) high mild

right size=1024/string_view_array positive n/1024
                        time:   [25.054 µs 25.500 µs 26.033 µs]
                        change: [−82.569% −82.285% −81.891%] (p = 0.00 < 0.05)
                        Performance has improved.
right size=1024/string_view_array negative n/1024
                        time:   [41.281 µs 42.730 µs 44.432 µs]
                        change: [−73.832% −73.288% −72.716%] (p = 0.00 < 0.05)
                        Performance has improved.
Found 5 outliers among 100 measurements (5.00%)
  3 (3.00%) high mild
  2 (2.00%) high severe

right size=4096/string_view_array positive n/4096
                        time:   [129.38 µs 133.69 µs 137.61 µs]
                        change: [−79.497% −78.998% −78.581%] (p = 0.00 < 0.05)
                        Performance has improved.
Found 4 outliers among 100 measurements (4.00%)
  4 (4.00%) high mild
right size=4096/string_view_array negative n/4096
                        time:   [218.16 µs 229.41 µs 243.30 µs]
                        change: [−65.405% −63.622% −61.515%] (p = 0.00 < 0.05)
                        Performance has improved.
Found 10 outliers among 100 measurements (10.00%)
  3 (3.00%) high mild
  7 (7.00%) high severe

Are these changes tested?

  • Existing unit tests for right

  • Added more unit tests

  • Added bench similar to right.rs

  • Existing SLTs pass

Are there any user-facing changes?

No

@github-actions github-actions bot added the functions Changes to functions implementation label Jan 29, 2026
@theirix theirix marked this pull request as ready for review January 29, 2026 23:17
@theirix
Copy link
Contributor Author

theirix commented Jan 29, 2026

cc @Jefffrey


/// Calculate the byte length of the substring of last `n` chars from string `string`
/// (or all but first `|n|` chars if n is negative)
fn right_byte_length(string: &str, n: i64) -> usize {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I haven't looked too closely, but I feel we can deduplicate right + left implementation code as the main difference is this byte length function? In that it flips which side it looks from?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Unfortunately, it's quite different.

  • left_byte_length and right_byte_length are almost symmetric with flipping the sign of the n argument, except for the 0 case.

  • The side of lookup in the byte array - from the left or from the middle.

  • The string view is built differently (length adjustment vs. offset).

So, having the generic implementation wouldn't be that helpful - plenty of ifs all around.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I do think we could deduplicate somewhat, and perhaps having this function return a Range instead of just a usize might make it more feasible; but can explore this in a followup 👍

},
(Some(string), Some(n)) => {
let byte_length = right_byte_length(string, n);
// println!(
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Commented code accidentally added here

args: args.clone(),
arg_fields: arg_fields.clone(),
number_rows: size,
return_field: Field::new("f", DataType::Utf8, true)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Shouldn't this use Utf8View when is_string_view == true ?

Ordering::Equal => string.len(),
Ordering::Greater => string
.char_indices()
.nth_back(n.unsigned_abs() as usize - 1)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: This may truncate on 32-bit machines

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For string arrays, we support 64-bit offsets on 32-bit platforms, but the string will be limited in 32-bit size. I added saturation, so left would return the whole string anyway.

For string views, Arrow provides 32-bit views only. Since we cannot construct a large string view as input, it won't be a problem.

@theirix
Copy link
Contributor Author

theirix commented Jan 31, 2026

What do you think of reworking both left and right benches to a single file like for trim.rs?

Copy link
Contributor

@Jefffrey Jefffrey left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The use of make_view() here made me realize we could use it for left as well 🤔

For example replace this section:

// Input string comes from StringViewArray, so it should fit in 32-bit length
let new_length: u32 = left_byte_length(string, n) as u32;
let byte_view = ByteView::from(view);
// Construct a new view
shrink_string_view_array_view(string, new_length, byte_view)

With this:

    let new_length = left_byte_length(string, n);
    let bytes = &string.as_bytes()[..new_length];
    let byte_view = ByteView::from(view);
    make_view(bytes, byte_view.buffer_index, byte_view.offset)

Though we can explore this in a followup

What do you think of reworking both left and right benches to a single file like for trim.rs?

I think that would be a good idea.

Comment on lines +199 to +210
if result_bytes.len() > 12 {
let byte_view = ByteView::from(view);
// Reuse buffer, but adjust offset and length
make_view(
result_bytes,
byte_view.buffer_index,
byte_view.offset + new_offset as u32,
)
} else {
// inline value does not need block id or offset
make_view(result_bytes, 0, 0)
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
if result_bytes.len() > 12 {
let byte_view = ByteView::from(view);
// Reuse buffer, but adjust offset and length
make_view(
result_bytes,
byte_view.buffer_index,
byte_view.offset + new_offset as u32,
)
} else {
// inline value does not need block id or offset
make_view(result_bytes, 0, 0)
}
let byte_view = ByteView::from(view);
make_view(
result_bytes,
byte_view.buffer_index,
byte_view.offset + new_offset as u32,
)

We could probably avoid this outside if check since make_view already checks this for us


/// Calculate the byte length of the substring of last `n` chars from string `string`
/// (or all but first `|n|` chars if n is negative)
fn right_byte_length(string: &str, n: i64) -> usize {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I do think we could deduplicate somewhat, and perhaps having this function return a Range instead of just a usize might make it more feasible; but can explore this in a followup 👍

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

functions Changes to functions implementation

Projects

None yet

Development

Successfully merging this pull request may close these issues.

perf: optimise right for byte access and StringView

3 participants