chore: support addition between a timestamp and a timedelta#1369
chore: support addition between a timestamp and a timedelta#1369
Conversation
|
|
||
| scalar_expr = bigframes_vendored.ibis.literal(literal) | ||
| if ibis_dtype: | ||
| if isinstance(literal, datetime.timedelta): |
There was a problem hiding this comment.
Should we allow pandas, numpy, and pyarrow scalars here, too?
| if isinstance(literal, datetime.timedelta): | |
| if isinstance(literal, (datetime.timedelta, numpy.timedelta64, pandas.Timedelta, pyarrow.DurationScalar)): |
There was a problem hiding this comment.
Sure! Pandas timedelta is a subclass of datetime.timedelta, but I can make it more explicit.
Added support for numpy.
I eventually decided not to implement pyarrow duration because:
- It causes more unrelated failures than I expected in Python 3.9 and 3.10 envs for some reasons.
- Pandas does not support adding pyarrow duration literals to series
Maybe we can add support for pyarrow in the future, but at this moment it may not be worth it
| def timedelta_to_micros(td: typing.Union[pd.Timedelta, datetime.timedelta]) -> int: | ||
| if isinstance(td, pd.Timedelta): | ||
| # td.value returns total nanoseconds. | ||
| return td.value // 1000 |
There was a problem hiding this comment.
Looks like pandas.Timedelta has units: https://pandas.pydata.org/docs/reference/api/pandas.Timedelta.html
Please make this robust to various units.
There was a problem hiding this comment.
that is for the constructor arg, but as per the docs, "The .value attribute is always in ns."
There was a problem hiding this comment.
Right. Pandas always converts the input to values in nanoseconds, no matter what unit we use in the constructor.
| return DATE_DTYPE | ||
| if issubclass(type, datetime.time): | ||
| return TIME_DTYPE | ||
| if issubclass(type, datetime.timedelta): |
There was a problem hiding this comment.
Likewise, do we need numpy, pandas, and pyarrow object detection here too?
There was a problem hiding this comment.
Done. Left the PyArrow out for this one.
146c690 to
c680948
Compare
c9a5021 to
b5b69cc
Compare
| if isinstance( | ||
| literal, | ||
| (datetime.timedelta, pd.Timedelta, numpy.timedelta64), | ||
| ): | ||
| # numpy timedelta is compatible with Ibis, so we process them separately. | ||
| return bigframes_vendored.ibis.literal( | ||
| utils.timedelta_to_micros(literal), ibis_dtype |
There was a problem hiding this comment.
Why don't we just replace literal expressions (as well as column defs in leaves) in the tree when we replace all the ops?
There was a problem hiding this comment.
Good call. I moved the conversion to the rewrite module
There was a problem hiding this comment.
I am going to rename the module rewrite.operators to something like rewrite.timedelta_expression to better reflect its behavior, perhaps in another PR
| @scalar_op_compiler.register_binary_op(ops.timestamp_add_op) | ||
| def timestamp_add_op_impl(x: ibis_types.TimestampValue, y: ibis_types.IntegerValue): | ||
| return x + y.to_interval("us") |
There was a problem hiding this comment.
What does this syntax look like from this? Is it a clean TIMESTAMP_ADD(timestamp_col, INTERVAL duration MICROSECOND)?
There was a problem hiding this comment.
Yes https://cloud.google.com/bigquery/docs/reference/standard-sql/timestamp_functions#timestamp_add
Though this has less to do with SQL syntax than the ease of wiring code. Ibis does support interval + timestamp and timestamp + interval, but that means I also need to check which side of the input is integer here.
If I can standardize the order of types in the rewrite module, then I don't need to check ibis type here :)
No description provided.