fix: model.fit metric not collected issue.#1085
Conversation
| # fit the model, synchronously | ||
| _, job = session._start_query_ml_ddl(sql) | ||
| if session._metrics is not None: | ||
| session._metrics.count_job_stats(job) |
There was a problem hiding this comment.
Should we not do this inside _start_query_ml_ddl? That way all ML DDLs will be accounted in the session metrics
There was a problem hiding this comment.
Updated, and added an assertion in test for register.
|
|
||
| start_execution_count = df._block._expr.session._metrics.execution_count | ||
|
|
||
| model.fit(X_train, y_train) |
There was a problem hiding this comment.
I think we have the opportunity to make this work work all ML APIs - fit, score, predict, transform, .... We should write such tests for all APIs by writing the metrics collection logic in more central place in the code.
There was a problem hiding this comment.
score, predict works, seems internally is calling other functions.
There was a problem hiding this comment.
Good to know, thanks for checking! We could just pick one or two of the existing tests that does fit, score, predict, register and transform and assert everywhere that we are counting stats for those operations.
| job_config.destination_encryption_configuration = None | ||
|
|
||
| return bf_io_bigquery.start_query_with_client(self.bqclient, sql, job_config) | ||
| results_iterator, query_job = bf_io_bigquery.start_query_with_client( |
There was a problem hiding this comment.
I see start_query_with_client already has the stats update
python-bigquery-dataframes/bigframes/session/_io/bigquery/__init__.py
Lines 245 to 246 in fd06d31
so feels like we could be adding double counting
There was a problem hiding this comment.
It's not double counting, but yes here we can send the metric into the start_query_with_client, thanks for point that out.
| model.fit(X_train, y_train) | ||
|
|
||
| end_execution_count = df._block._expr.session._metrics.execution_count | ||
| assert end_execution_count - start_execution_count == 2 |
There was a problem hiding this comment.
Worth adding a comment why fit does 2 queries
Thank you for opening a Pull Request! Before submitting your PR, there are a few things you can do to make sure it goes smoothly:
Fixes #<issue_number_goes_here> 🦕