-
Notifications
You must be signed in to change notification settings - Fork 1.5k
predicate pruning: support cast and try_cast for more types #15764
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
@appletreeisyellow would you mind reviewing 🙏🏻 ? |
@adriangb Yes, I'm happy to review. I'll have some time to review it this afternoon or evening |
if from_type == to_type { | ||
return Ok(()); | ||
} | ||
// TODO: add an `is_string()` method to DataType |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
👍
@@ -1544,7 +1572,10 @@ fn build_predicate_expression( | |||
Ok(builder) => builder, | |||
// allow partial failure in predicate expression generation | |||
// this can still produce a useful predicate when multiple conditions are joined using AND | |||
Err(_) => return unhandled_hook.handle(expr), | |||
Err(e) => { | |||
dbg!(format!("Error building pruning expression: {e}")); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I recommend changing this to debug!
(dbug logging rather than printing to stdout on failure)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks @adriangb -- this seems like an improvement to me.
However, I am thinking a better solution might be to avoid checking cast behavior entirely in the pruning code and assume any casting has already been done (which I think is true). Maybe we can file a follow on ticket to explore that idea.
This would likely both fix @etsidel's report on #15742 (comment) and would likely be simpler code
@@ -1215,8 +1215,33 @@ fn is_compare_op(op: Operator) -> bool { | |||
// For example, casts from string to numbers is not correct. | |||
// Because the "13" is less than "3" with UTF8 comparison order. | |||
fn verify_support_type_for_prune(from_type: &DataType, to_type: &DataType) -> Result<()> { | |||
// TODO: support other data type for prunable cast or try cast | |||
if matches!( | |||
// Dictionary casts are always supported as long as the value types are supported |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't remember why we we even have this code -- I think it may predate unwrapping casts in the logical planning phase so by the time this code sees physical expressions it shouldn't have to add casts at all 🤔
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think the big issue is when comparing string to numeric (both sides will be coerced to string IIUC). Rather than a whitelist here, maybe instead disallow casts we know will fail, so this could maybe just return Err
if the LHS is numeric or temporal and the RHS is string.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thank you @etseidl, I'll implement that instead. I think we do need to keep the dict part in here.
Tangential but do any of you know where we have functionality to simplify/adapt a PhysicalExpr given a schema eg to remove unnecessary casts or add cases where necessary?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks good to me! Yes, Dictionary
type can be easily missed in the verification. Thank you for adding the unit tests!
} | ||
// TODO: add an `is_string()` method to DataType | ||
let supported_string_cast = (matches!( | ||
// String -> String casts are suppoted |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
// String -> String casts are suppoted | |
// String -> String casts are supported |
@etseidl I just pushed a change that does what I think you suggested and only denies a cast between a string and a non string type but otherwise assumes that the general casting rules have already been sorted out. @appletreeisyellow I pushed several more tests for the dict cases. Would love to see this merged since it ended up blocking some internal work 🙏🏻 |
Thanks @adriangb. I'm testing now... |
Yep, fixes my issue. Thanks again! |
Given approvals and relatively simple change, could a committer merge this please? |
Random ping to @xudong963 🙏🏻 |
Thanks everyone! |
…5764) * predicate pruning: support dictionaries * more types * clippy * add tests * add tests * simplify to dicts * revert most changes * just check for strings, more tests * more tests * remove unecessary now confusing clause
Adds support for dictionaries and some more primitive types to pruning predicate.
I encountered this when feeding in a stats schema with dictionary encoded columns.
I see no reason we can't support this.