Skip to content

[Variant] clean up variant_get tests#9518

Open
sdf-jkl wants to merge 7 commits intoapache:mainfrom
sdf-jkl:variant_get-clean-up
Open

[Variant] clean up variant_get tests#9518
sdf-jkl wants to merge 7 commits intoapache:mainfrom
sdf-jkl:variant_get-clean-up

Conversation

@sdf-jkl
Copy link
Contributor

@sdf-jkl sdf-jkl commented Mar 5, 2026

Which issue does this PR close?

Rationale for this change

check issue

What changes are included in this PR?

  • Use variant_shred in test macros
  • Use VariantArray::from_parts instead of using StructArrayBuilder

Are these changes tested?

yes, changes pass same tests

Are there any user-facing changes?

no

@github-actions github-actions bot added the parquet-variant parquet-variant* crates label Mar 5, 2026
@sdf-jkl sdf-jkl marked this pull request as draft March 5, 2026 22:31
@sdf-jkl
Copy link
Contributor Author

sdf-jkl commented Mar 5, 2026

The test error was caused by:

thread 'variant_get::test::get_variant_perfectly_shredded_time_as_time32_second' (3372) panicked at parquet-variant-compute/src/type_conversion.rs:97:11:
attempt to multiply with overflow

the line here:

impl_primitive_from_variant!(datatypes::Time64MicrosecondType, as_time_utc, |v| {
    Some((v.num_seconds_from_midnight() * 1_000_000 + v.nanosecond() / 1_000) as i64)
});

Multiplication happens before casting v.num_seconds_from_midnight() or v.nanosecond() to i64 and could cause an overflow.

Fixed here: 718efbf

@sdf-jkl sdf-jkl marked this pull request as ready for review March 5, 2026 22:40
@sdf-jkl
Copy link
Contributor Author

sdf-jkl commented Mar 5, 2026

@codephage2020 @scovich @klion26 please check when you are available. Thanks!

Copy link
Contributor

@codephage2020 codephage2020 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM. One small suggestion.

Comment on lines 3182 to 3186
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: perhaps we could also use from_parts here?

Suggested change
let a_field_shredded =
ShreddedVariantFieldArray::from_parts(None, Some(Arc::new(a_field_typed_value)), None);

(several more below, if you choose to change this)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I thought about it, but it looked like using VariantArray::from_parts where only one field is used doesn't save much space or make it more readable.

I am open to it tho.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I realized you were suggesting using ShreddedVariantFieldArray::from_parts, I'll use that!

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

thanks, I used it here a56a368

Copy link
Member

@klion26 klion26 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the improvement. Overall looks great to me. I left some minor comments to be considered.

Another question is whether we need to support shred to LargeUtf8/LargeBinary, I can only find this comment, it seems is legal to support (it may not need to be included in the current pr even if we want to support it)

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could you please also improve this, "n/a" is also string.

@sdf-jkl
Copy link
Contributor Author

sdf-jkl commented Mar 10, 2026

Thanks for the improvement. Overall looks great to me. I left some minor comments to be considered.

Another question is whether we need to support shred to LargeUtf8/LargeBinary, I can only find this comment, it seems is legal to support (it may not need to be included in the current pr even if we want to support it)

You're right. I found the mention in shred_variant.rs. They are not supported here yet, however they are supported by VariantArray::try_new and canonicalize_and_verify_data_type as mentioned by @friendlymatthew in #9515:

#[test]
fn test_invalid_shredded_types_rejected() {
let input = VariantArray::from_iter([Variant::from(42)]);
let invalid_types = vec![
DataType::UInt8,
DataType::Float16,
DataType::Decimal256(38, 10),
DataType::Date64,
DataType::Time32(TimeUnit::Second),
DataType::Time64(TimeUnit::Nanosecond),
DataType::Timestamp(TimeUnit::Millisecond, None),
DataType::LargeBinary,
DataType::LargeUtf8,
DataType::FixedSizeBinary(17),
DataType::Union(
UnionFields::from_fields(vec![
Field::new("int_field", DataType::Int32, false),
Field::new("str_field", DataType::Utf8, true),
]),

I'd add support for these types for shred_variant before moving forward.

@sdf-jkl
Copy link
Contributor Author

sdf-jkl commented Mar 10, 2026

I filed a ticket for the LargeBinary|LargeUtf8 shred_variant support - #9525

@sdf-jkl
Copy link
Contributor Author

sdf-jkl commented Mar 10, 2026

Moved some tests to be close to each other, might split to a separate PR to make this easier to review. cc1869f

Mostly code movement and removed some comments.

arrow::array::BooleanArray::from(vec![Some(true), None, None, Some(false)])
});

partially_shredded_variant_array_gen!(partially_shredded_utf8_variant_array, || {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

IIUC, this will generate the data that(the second is invalid) typed_value contains hello, n/a, and world(Variant::from("n/a") in the macro will also be shredded into typed_value) , there is no item located in value, maybe we can improve this case to satisfy that there are some items in typed_value and some in value

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

parquet-variant parquet-variant* crates

Projects

None yet

Development

Successfully merging this pull request may close these issues.

[Variant] variant_get tests clean up

3 participants