Skip to content

Conversation

@sdf-jkl
Copy link
Contributor

@sdf-jkl sdf-jkl commented Jan 28, 2026

Which issue does this PR close?

Rationale for this change

Check issue

What changes are included in this PR?

Match arm to support preimage for InList expressions in expr_simplifier.rs

Are these changes tested?

Yes, added two tests for IN LIST and NOT IN LIST support.

Are there any user-facing changes?

No

@github-actions github-actions bot added the optimizer Optimizer rules label Jan 28, 2026
Copy link
Contributor

@masonh22 masonh22 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This looks good, one small comment

@sdf-jkl
Copy link
Contributor Author

sdf-jkl commented Jan 29, 2026

Thanks for your review @masonh22.

I am thinking about NULL support inside InList expressions for this simplification. I don't think InList supports distinct behavior like is distinct from or is not distinct from, so we should probably avoid it.

@masonh22
Copy link
Contributor

I am thinking about NULL support inside InList expressions for this simplification. I don't think InList supports distinct behavior like is distinct from or is not distinct from, so we should probably avoid it.

Hmmm, I see that you are already checking whether the list contains null and not applying the rule in that case. I'm not really sure what the semantics are for null values in InList. I see that ShortenInListSimplifier, which rewrites InList expressions as a == x || a == y ... (or a != x && a != y ... if it's negated), doesn't check whether any of the values are null. So from that perspective, I think it would be fine to pass a null literal value into get_preimage().

Does that make sense? Is that what you were thinking about?

@sdf-jkl
Copy link
Contributor Author

sdf-jkl commented Jan 30, 2026

If we pass a Null literal in get_preimage as is, it will not return PreimageResult::Range and the whole simplification will be aborted.
I was thinking about checking if any(is_null) and storing it. When true, we would add expr.is_null() or expr.is_not_null() to the resulting expression. It will be based on the negate boolean(false => is_null(), true => is_not_null()).

When looping through the expressions in the list we would check if expr.is_null(), and if it is continue, else get_preimage(expr).

But I don't think this is relevant because if you want distinct behavior, you should chain is_(not_)distinct_from expressions instead of using InList.

@masonh22
Copy link
Contributor

Oh right I see what you mean. I think you could skip over nulls while looping over the expressions but I don't think you need to add expr.is_null() to the resulting expression. Null values in the InList array should have no impact on what it evaluates to if it doesn't use the is distinct from semantics.

Copy link
Contributor

@comphead comphead left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @sdf-jkl please help me to understand how inlist preimage is different from datafusion/optimizer/src/simplify_expressions/inlist_simplifier.rs

And if it is, maybe the file above would be the better place for InList ?

@sdf-jkl
Copy link
Contributor Author

sdf-jkl commented Feb 1, 2026

@comphead The issue with existing inlist_simplifier.rs is this guard:

        }) = expr
            && !list.is_empty()
            && (
                // For lists with only 1 value we allow more complex expressions to be simplified
                // e.g SUBSTR(c1, 2, 3) IN ('1') -> SUBSTR(c1, 2, 3) = '1'
                // for more than one we avoid repeating this potentially expensive
                // expressions
                list.len() == 1
                    || list.len() <= THRESHOLD_INLINE_INLIST
                        && expr.try_as_col().is_some() // <--- this point here
            )

When list.len() is greater than 1, we only perform this simplification if lhs is a Column expression. Expressions where the column wrapped in a function are not handled.

I was thinking about adding support inside inlist_simplifier.rs, but there is a precedent for handling InList support inside expr_simplifier.rs for unwrap_cast. If move unwrap_cast and preimage InList support here, we'd need another guard to check which function is being passed. Currently, the only way is by matching function name, which isn't neat.

There is an issue for that:

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

optimizer Optimizer rules

Projects

None yet

Development

Successfully merging this pull request may close these issues.

InList support for pre-image udf

3 participants