[AURON #2045] Fix IndexOutOfBoundsException in UDAF by Flyangz · Pull Request #2046 · apache/auron

Flyangz · 2026-02-27T11:53:00Z

Which issue does this PR close?

Closes #2045

What changes are included in this PR?

Create a new zero based offsets for the export array.

Are there any user-facing changes?

No

How was this patch tested?

Copilot

Pull request overview

This pull request fixes an IndexOutOfBoundsException that occurs when passing sliced Arrow arrays from Rust to Java via FFI in Spark UDAFs. The issue arose because the Java side expects Arrow arrays with 0-based offsets, but Rust was passing sliced arrays that retained their original buffer with non-zero offsets.

Changes:

Added logic to detect "clean" arrays (arrays with 0-based offsets) and pass them through unchanged
For sliced arrays, creates new zero-based arrays by reconstructing offsets, value buffers, and null buffers
Added necessary imports for array reconstruction (ArrayData, BooleanBufferBuilder, Buffer)

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Copilot · 2026-02-27T13:54:22Z

native-engine/datafusion-ext-plans/src/agg/spark_udaf_wrapper.rs

+            let new_null_buf = if binary_array.null_count() == 0 {
+                None
+            } else {
+                let mut buffer_builder = BooleanBufferBuilder::new(len);
+                for i in 0..len {
+                    buffer_builder.append(binary_array.is_valid(i));
+                }
+                Some(buffer_builder.finish().into_inner())
+            };


The null buffer reconstruction iterates through all elements calling is_valid, which can be inefficient for large arrays. Consider using the existing null buffer from array_data.nulls() and slicing it appropriately, similar to how the value buffer is sliced. If slicing the null buffer directly is not straightforward due to bit alignment, the current approach is acceptable but could be optimized.

Suggested change

let new_null_buf = if binary_array.null_count() == 0 {

None

} else {

let mut buffer_builder = BooleanBufferBuilder::new(len);

for i in 0..len {

buffer_builder.append(binary_array.is_valid(i));

}

Some(buffer_builder.finish().into_inner())

};

let new_null_buf = array_data.nulls().cloned();

Copilot · 2026-02-27T13:54:22Z

native-engine/datafusion-ext-plans/src/agg/spark_udaf_wrapper.rs

+        let ffi_imported_rows = FFI_ArrowArray::new(&array_to_export.to_data());
        let rows = jni_call!(SparkUDAFWrapperContext(self.jcontext.as_obj())
            .importRows(&ffi_imported_rows as *const FFI_ArrowArray as i64) -> JObject)?;
        self.obj = jni_new_global_ref!(rows.as_obj())?;


The assertion that validated the number of imported rows matches the expected count has been removed. Consider adding back a validation check after the importRows call to ensure the Java side processed the correct number of rows. You can use array.len() (or binary_array.len() since it's still in scope) to validate against self.num_records() after line 454.

Suggested change

self.obj = jni_new_global_ref!(rows.as_obj())?;

self.obj = jni_new_global_ref!(rows.as_obj())?;

let expected_rows = array_to_export.len();

let imported_rows = self.num_records();

assert_eq!(

imported_rows,

expected_rows,

"SparkUDAFBufferRowsColumn::unfreeze_from_arrays: Java imported {imported_rows} rows, expected {expected_rows}"

);

Copilot · 2026-02-27T13:54:23Z

native-engine/datafusion-ext-plans/src/agg/spark_udaf_wrapper.rs

+                data_builder = data_builder.null_bit_buffer(Some(buf));
+            }
+
+            let data = unsafe { data_builder.build_unchecked() };


Using build_unchecked bypasses ArrayData validation, which could lead to undefined behavior if the constructed data is invalid. Consider using the safe builder API pattern instead. You can replace unsafe { data_builder.build_unchecked() } with data_builder.build()? to ensure the ArrayData is valid before proceeding.

Suggested change

let data = unsafe { data_builder.build_unchecked() };

let data = data_builder.build()?;

Copilot · 2026-02-27T13:54:23Z

native-engine/datafusion-ext-plans/src/agg/spark_udaf_wrapper.rs

-        let ffi_imported_rows = FFI_ArrowArray::new(&arrays[0].to_data());
+        let array = &arrays[0];
+        let binary_array = downcast_any!(array, BinaryArray)?;
+


The condition checks both binary_array.offset() == 0 and binary_array.value_offsets()[0] == 0, but these checks serve different purposes. The offset() check verifies the array itself is not sliced at the array level, while value_offsets()[0] == 0 checks if the values buffer starts at offset 0. Consider adding a comment explaining this dual check to clarify why both conditions are necessary for determining if the array is "clean".

Suggested change

// "Clean" means neither the logical array nor its values buffer are sliced:

// - offset() == 0 ensures the array itself has no logical offset (not sliced at array level)

// - value_offsets()[0] == 0 ensures the values buffer also starts at 0 (not sliced in the buffer)

fix IndexOutOfBoundsException in udaf

af0ccda

github-actions bot added the native label Feb 27, 2026

cxzl25 requested a review from Copilot February 27, 2026 13:49

Copilot started reviewing on behalf of cxzl25 February 27, 2026 13:49 View session

Copilot AI reviewed Feb 27, 2026

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[AURON #2045] Fix IndexOutOfBoundsException in UDAF#2046

[AURON #2045] Fix IndexOutOfBoundsException in UDAF#2046
Flyangz wants to merge 1 commit intoapache:masterfrom
Flyangz:bugfix/udaf-iob

Flyangz commented Feb 27, 2026

Uh oh!

Copilot AI left a comment

Uh oh!

Copilot AI Feb 27, 2026

Uh oh!

Copilot AI Feb 27, 2026

Uh oh!

Copilot AI Feb 27, 2026

Uh oh!

Copilot AI Feb 27, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

-        self.obj = jni_new_global_ref!(rows.as_obj())?;
+        self.obj = jni_new_global_ref!(rows.as_obj())?;
+        let expected_rows = array_to_export.len();
+        let imported_rows = self.num_records();
+        assert_eq!(
+            imported_rows,
+            expected_rows,
+            "SparkUDAFBufferRowsColumn::unfreeze_from_arrays: Java imported {imported_rows} rows, expected {expected_rows}"
+        );

	let data = unsafe { data_builder.build_unchecked() };
	let data = data_builder.build()?;

+        // "Clean" means neither the logical array nor its values buffer are sliced:
+        // - offset() == 0 ensures the array itself has no logical offset (not sliced at array level)
+        // - value_offsets()[0] == 0 ensures the values buffer also starts at 0 (not sliced in the buffer)

Conversation

Flyangz commented Feb 27, 2026

Which issue does this PR close?

What changes are included in this PR?

Are there any user-facing changes?

How was this patch tested?

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Uh oh!

Copilot AI Feb 27, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Feb 27, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Feb 27, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Feb 27, 2026

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants