Skip to content

[SQL] Improve code generation for functions that use Regex and Timezone#6043

Merged
mihaibudiu merged 4 commits intofeldera:mainfrom
mihaibudiu:issue6039
Apr 17, 2026
Merged

[SQL] Improve code generation for functions that use Regex and Timezone#6043
mihaibudiu merged 4 commits intofeldera:mainfrom
mihaibudiu:issue6039

Conversation

@mihaibudiu
Copy link
Copy Markdown
Contributor

Fixes #6039
Fixes #6035

There are a few other small commits.

The most important change is to change the generated code from:

fn convert_timezone(tz: SqlString, ...) -> Option<Timestamp> {
  let t = parse_timezone(tz); 
  ...
}

to

fn parse_timezone(tz: SqlString) -> Zone {
   ...
}

fn convert_timezone(t: Zone...) -> Option<Timestamp> {
  ...
}

The compiler already lifts constant expressions into lazy statics, so for the case when timezones are constant strings, this will parse each timezone only once. A similar adaptation has been made for functions that use Regex; these were hand-optimized previously for the case of constant regex patterns, but this way of doing things is simpler.

Breaking changes

Regex functions (RLIKE and REGEXP_REPLACE) will now crash if the regex is not a legal Rust regex, matching the behavior of Postgres. Previously they were just ignoring malformed regexps and behaving as if there were no matches.

@mihaibudiu mihaibudiu force-pushed the issue6039 branch 2 times, most recently from 50479f5 to 274be52 Compare April 15, 2026 01:17
Copy link
Copy Markdown

@mythical-fred mythical-fred left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Two blockers.

if (i != 1)
this.ensureString(ops, i);
}
ops.set(1, this.makeRegex(ops.get(1)));
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This change is not just a codegen cleanup; it also changes the non-constant regexp_replace/RLIKE path to go through ParsedRegex, which now turns an invalid pattern into a runtime failure instead of the previous fallback behavior. The tests only pin down the constant-invalid case plus one dynamic happy path. Please add coverage for invalid regex values coming from a table/expression as well, otherwise the main behavior change here is still unproven.

validateArgCount(node, operationName, ops.size(), 3);
ensureString(ops, 0);
ensureString(ops, 1);
ops.set(0, this.makeTimezone(ops.get(0)));
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same issue for timezones: after routing all CONVERT_TIMEZONE arguments through parse_timezone*, both constant and non-constant values take the new Zone path. The test suite covers an invalid constant timezone and a valid dynamic one, but it does not cover an invalid timezone coming from a column/expression. Since this PR changes runtime semantics in that path too, please add a non-constant invalid-timezone test before merging.

Comment thread crates/sqllib/src/string.rs Outdated
match Regex::new(re.str()) {
Ok(r) => ParsedRegex::Regex(r),
Err(e) => {
tracing::warn!(
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
tracing::warn!(
warn!(

Comment thread crates/sqllib/src/string.rs Outdated
re,
e.to_string()
);
ParsedRegex::Invalid(re.str().to_string(), e.to_string())
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

re is already SqlString, can't it just stay like that in Invalid?

Iana(Tz),
Offset(FixedOffset),
// Timezone string
Invalid(String),
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

wouldn't it be better to make those things Result<Zone, String> rather than encoding the failure case inside of Zone (and ParsedRegex)

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could, but then the code will have Option<Result<>> as arguments, and I am not sure whether the macros using ? will all work fine.

}

// Create a schema that retrieves data from HSQLDB
DataSource mockDataSource = JdbcSchema.dataSource(jdbcUrl, "org.hsqldb.jdbcDriver", "", "");
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can you get rid of hsqldb as a dependency now?

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we still have a few tests that use it. Could remove it, but will it save anything?

@mihaibudiu mihaibudiu force-pushed the issue6039 branch 2 times, most recently from a787789 to 0318743 Compare April 16, 2026 23:38
Signed-off-by: Mihai Budiu <mbudiu@feldera.com>
Signed-off-by: Mihai Budiu <mbudiu@feldera.com>
…gex values

Signed-off-by: Mihai Budiu <mbudiu@feldera.com>
Signed-off-by: Mihai Budiu <mbudiu@feldera.com>
@mihaibudiu mihaibudiu enabled auto-merge April 16, 2026 23:38
@mihaibudiu mihaibudiu added this pull request to the merge queue Apr 16, 2026
Merged via the queue into feldera:main with commit 2643035 Apr 17, 2026
1 check passed
@mihaibudiu mihaibudiu deleted the issue6039 branch April 17, 2026 02:53
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

[SQL] Improve evaluation of parsing timezones and regex [SQL/DOCS] Clarification is need: TIMESTAMP - TIMESTAMP is rejected by compiler

3 participants