feat(arrow-avro): accept default value of null for Avro union with null type in any branch position#9487
Draft
mzabaluev wants to merge 3 commits intoapache:mainfrom
Draft
Conversation
Test the Avro 1.12 spec behavior of resolving default values in the specific case when the default value for the field added in the reader schema is null, and null the second branch in the field's union type.
Contributor
Author
|
Created as a draft for the time being because the changes are not yet feature-gated as planned per #8703. |
Avro 1.12, new rules.
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Which issue does this PR close?
Rationale for this change
The Avro specification version 1.12 extends acceptance of default values for unions to match any schema branch in the union rather than the first.
This change implements the new behavior in the specific case of the default value being null, which is important for some real-world cases of Iceberg schema evolution. Spark converts nullable fields in its SQL schema to Avro field types with the null variant listed last. When a column is added to an iceberg table backed by Avro files, the default value of its field in the reader schema shall be specified as null.
What changes are included in this PR?
Change the case verification of null default value for union and nullable types to allow null in any branch (for unions treated as Arrow unions) and nullability order (for unions treated as nullable types).
Are these changes tested?
Added a column in
test_schema_resolution_defaults_all_supported_typesto exercise the["int", "null"]type with the default of null.Are there any user-facing changes?
This is a behavioral change where more schema resolution cases become accepted than were permitted by the Avro 1.11 spec.