feat: add support for pandas series & data frames as inputs for ml models. by sycai · Pull Request #1088 · googleapis/python-bigquery-dataframes

sycai · 2024-10-15T22:06:29Z

No description provided.

TrevorBergeron · 2024-10-15T22:13:02Z



-def convert_to_dataframe(*input: ArrayType) -> Generator[bpd.DataFrame, None, None]:
+def convert_to_dataframe(


Do you think we can merge the logic in this file into the core/convert module logic? Ideally we don't have pandas->bigframes logic in two places.

Good point! Though I think it's not very straightforward to do so. This is mainly because this function returns a Generator, while the one in the core package returns a single value. We will need some extra effort to make everything consistent (function names, parameter types, return types, etc).

Considering that this RP is already not trivial, I think we can for now only focus on the feature delivery. I will migrate the conversion logic in another PR. Does it sound good to you?

b/373716095 for reference

Can split up the work of course, as long as each step stands alone as an improvement. Not sure how much the generator aspect matters - but unifying the two approaches I think will be a good exercise and result in some improvements to both.

TrevorBergeron · 2024-10-15T22:13:40Z

+        X: Union[bpd.DataFrame, bpd.Series, pd.DataFrame, pd.Series],
+        y: Optional[Union[bpd.DataFrame, bpd.Series, pd.DataFrame, pd.Series]] = None,


Do you think we can define a single annotation representing this set of types and use it everywhere. This will make it easier to accomodate additional types in the future

My main concern is that type aliases may not be expanded/resolved when generating docs, and thus confuse our end users.

It seems that sklearn doesn't try to define a set of types in its docs: https://scikit-learn.org/1.5/modules/generated/sklearn.linear_model.LinearRegression.html#sklearn.linear_model.LinearRegression.fit ?

Updated all to use type alias because Sphinx is able to resolve them

TrevorBergeron · 2024-10-15T22:14:58Z



-def _convert_to_dataframe(frame: ArrayType) -> bpd.DataFrame:
+def _convert_to_dataframe(frame: InputArrayType) -> bpd.DataFrame:


Should we also handle array-like data like numpy arrays or even plain python list/tuples?

Yeah, the effort should be trivial if we are to consolidate the conversion functions from the ml package to the core package. I can handle this in another CL as what is proposed above. Let me know your thoughts.

TrevorBergeron · 2024-10-16T18:14:22Z

        return frame
+    if isinstance(frame, pd.DataFrame):
+        # Recursively call this method to re-use the length-checking logic
+        return _convert_to_series(bpd.read_pandas(frame))


we might not always want the default session, if the other argument is a bigframes object with a non-default session. the core version uses the session from the co-argument

Good point! I decided to use the sessions provided from the bqml_model whenever possible. The global session acts as a default.

See https://github.com/googleapis/repo-automation-bots/blob/main/packages/owl-bot/README.md

TrevorBergeron · 2024-10-23T00:43:30Z


        Args:
-            X (bigframes.dataframe.DataFrame or bigframes.series.Series):
+            X (bigframes.dataframe.DataFrame or bigframes.series.Series or pandas.core.frame.DataFrame or pandas.core.series.Series):


I worry that fully enumerating the accepted types will be too much once we further extend

I agree with you. The Google style prefers type hints over type documents, and it makes more sense. Here I'm just keeping the style consistent.

sycai requested review from a team and chelsea-lin October 15, 2024 22:06

product-auto-label bot added the size: l Pull request size is large. label Oct 15, 2024

blunderbuss-gcf bot assigned chelsea-lin Oct 15, 2024

product-auto-label bot added the api: bigquery Issues related to the googleapis/python-bigquery-dataframes API. label Oct 15, 2024

sycai force-pushed the b362723869 branch from 94eb2fd to c498059 Compare October 15, 2024 22:09

TrevorBergeron reviewed Oct 15, 2024

View reviewed changes

sycai requested a review from TrevorBergeron October 15, 2024 22:48

sycai added 4 commits October 16, 2024 02:45

support pandas dataframes and series as model inputs.

9469b43

polish code and add tests

f9a7c09

clean up code

48166c9

fix type hints

65c7516

sycai force-pushed the b362723869 branch from c498059 to 65c7516 Compare October 16, 2024 02:45

sycai and others added 3 commits October 16, 2024 02:47

fix lint

ee906c0

fix a bug that was introduced in the last commit

87773a3

Merge branch 'main' into b362723869

d3634a1

TrevorBergeron reviewed Oct 16, 2024

View reviewed changes

sycai and others added 3 commits October 16, 2024 19:05

use type alias in type hints and update docs

a4599b9

add session parameter in the converters

d91cbc8

🦉 Updates from OwlBot post-processor

89caee6

See https://github.com/googleapis/repo-automation-bots/blob/main/packages/owl-bot/README.md

sycai requested a review from TrevorBergeron October 16, 2024 21:36

sycai and others added 3 commits October 16, 2024 21:46

fix default parameter issue

6f0ec17

fix type error

5642fed

Merge branch 'main' into b362723869

a9f231f

TrevorBergeron reviewed Oct 23, 2024

View reviewed changes

TrevorBergeron approved these changes Oct 23, 2024

View reviewed changes

sycai added 2 commits October 22, 2024 18:54

Merge branch 'main' into b362723869

9b42dc7

Merge branch 'main' into b362723869

1b4aa4d

sycai enabled auto-merge (squash) October 23, 2024 17:30

sycai merged commit 30c8883 into main Oct 23, 2024

sycai deleted the b362723869 branch October 23, 2024 18:25

release-please bot mentioned this pull request Oct 23, 2024

chore(main): release 1.23.0 #1075

Merged



		def convert_to_dataframe(*input: ArrayType) -> Generator[bpd.DataFrame, None, None]:
		def convert_to_dataframe(

		X: Union[bpd.DataFrame, bpd.Series, pd.DataFrame, pd.Series],
		y: Optional[Union[bpd.DataFrame, bpd.Series, pd.DataFrame, pd.Series]] = None,



		def _convert_to_dataframe(frame: ArrayType) -> bpd.DataFrame:
		def _convert_to_dataframe(frame: InputArrayType) -> bpd.DataFrame:

Conversation

sycai commented Oct 15, 2024

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants