[SQL] Improve code generation for functions that use Regex and Timezone#6043
[SQL] Improve code generation for functions that use Regex and Timezone#6043mihaibudiu merged 4 commits intofeldera:mainfrom
Conversation
50479f5 to
274be52
Compare
| if (i != 1) | ||
| this.ensureString(ops, i); | ||
| } | ||
| ops.set(1, this.makeRegex(ops.get(1))); |
There was a problem hiding this comment.
This change is not just a codegen cleanup; it also changes the non-constant regexp_replace/RLIKE path to go through ParsedRegex, which now turns an invalid pattern into a runtime failure instead of the previous fallback behavior. The tests only pin down the constant-invalid case plus one dynamic happy path. Please add coverage for invalid regex values coming from a table/expression as well, otherwise the main behavior change here is still unproven.
| validateArgCount(node, operationName, ops.size(), 3); | ||
| ensureString(ops, 0); | ||
| ensureString(ops, 1); | ||
| ops.set(0, this.makeTimezone(ops.get(0))); |
There was a problem hiding this comment.
Same issue for timezones: after routing all CONVERT_TIMEZONE arguments through parse_timezone*, both constant and non-constant values take the new Zone path. The test suite covers an invalid constant timezone and a valid dynamic one, but it does not cover an invalid timezone coming from a column/expression. Since this PR changes runtime semantics in that path too, please add a non-constant invalid-timezone test before merging.
| match Regex::new(re.str()) { | ||
| Ok(r) => ParsedRegex::Regex(r), | ||
| Err(e) => { | ||
| tracing::warn!( |
There was a problem hiding this comment.
| tracing::warn!( | |
| warn!( |
| re, | ||
| e.to_string() | ||
| ); | ||
| ParsedRegex::Invalid(re.str().to_string(), e.to_string()) |
There was a problem hiding this comment.
re is already SqlString, can't it just stay like that in Invalid?
| Iana(Tz), | ||
| Offset(FixedOffset), | ||
| // Timezone string | ||
| Invalid(String), |
There was a problem hiding this comment.
wouldn't it be better to make those things Result<Zone, String> rather than encoding the failure case inside of Zone (and ParsedRegex)
There was a problem hiding this comment.
Could, but then the code will have Option<Result<>> as arguments, and I am not sure whether the macros using ? will all work fine.
| } | ||
|
|
||
| // Create a schema that retrieves data from HSQLDB | ||
| DataSource mockDataSource = JdbcSchema.dataSource(jdbcUrl, "org.hsqldb.jdbcDriver", "", ""); |
There was a problem hiding this comment.
can you get rid of hsqldb as a dependency now?
There was a problem hiding this comment.
I think we still have a few tests that use it. Could remove it, but will it save anything?
a787789 to
0318743
Compare
Signed-off-by: Mihai Budiu <mbudiu@feldera.com>
Signed-off-by: Mihai Budiu <mbudiu@feldera.com>
…gex values Signed-off-by: Mihai Budiu <mbudiu@feldera.com>
Signed-off-by: Mihai Budiu <mbudiu@feldera.com>
Fixes #6039
Fixes #6035
There are a few other small commits.
The most important change is to change the generated code from:
to
The compiler already lifts constant expressions into lazy statics, so for the case when timezones are constant strings, this will parse each timezone only once. A similar adaptation has been made for functions that use
Regex; these were hand-optimized previously for the case of constant regex patterns, but this way of doing things is simpler.Breaking changes
Regex functions (
RLIKEandREGEXP_REPLACE) will now crash if the regex is not a legal Rust regex, matching the behavior of Postgres. Previously they were just ignoring malformed regexps and behaving as if there were no matches.