This repository was archived by the owner on Apr 1, 2026. It is now read-only.
-
Notifications
You must be signed in to change notification settings - Fork 68
feat: add support for creating a Matrix Factorization model #1330
Merged
Merged
Changes from all commits
Commits
Show all changes
153 commits
Select commit
Hold shift + click to select a range
6783a0a
docs: update title of pypi notebook example to reflect use of the PyP…
tswast 1d39560
feat: add support for creating a Matrix Factorization model
rey-esp e19c262
feat: add support for creating a Matrix Factorization model
rey-esp 1bef4a2
feat: add support for creating a Matrix Factorization model
rey-esp d157cd7
Merge branch 'main' into b338873783-matrix-factorization
rey-esp e336bde
Update bigframes/ml/decomposition.py
rey-esp d5f713a
Update bigframes/ml/decomposition.py
rey-esp 5e3e443
Update bigframes/ml/decomposition.py
rey-esp 34a60bc
Merge branch 'main' into b338873783-matrix-factorization
rey-esp c116e8a
rating_col
rey-esp dedef39
(nearly) complete class
rey-esp e5165a9
Merge branch 'main' into b338873783-matrix-factorization
rey-esp 05eb854
Merge branch 'main' into b338873783-matrix-factorization
rey-esp 2787178
removem print()
rey-esp 8c66e07
removem print()
rey-esp 086b4dd
adding recommend
rey-esp 8ed3ccd
Merge branch 'main' into b338873783-matrix-factorization
rey-esp 1b4eef9
Merge branch 'main' into b338873783-matrix-factorization
rey-esp 7c371ac
remove hyper parameter runing references
rey-esp 7498c8c
Merge branch 'main' into b338873783-matrix-factorization
rey-esp 55ef06a
Merge branch 'main' into b338873783-matrix-factorization
rey-esp 29805b5
Merge branch 'main' into b338873783-matrix-factorization
rey-esp 8de384a
swap predict in _mf for recommend
rey-esp 647532b
recommend -> predict
rey-esp b340c4f
update predict doc string
rey-esp 580de41
Merge branch 'main' into b338873783-matrix-factorization
rey-esp 29ee357
Merge branch 'main' into b338873783-matrix-factorization
rey-esp bac2ece
Merge branch 'main' into b338873783-matrix-factorization
rey-esp 3f22c23
Merge branch 'b338873783-matrix-factorization' of github.com:googleap…
rey-esp 213f11d
Merge branch 'main' into b338873783-matrix-factorization
rey-esp aaf0d1f
Merge branch 'main' into b338873783-matrix-factorization
rey-esp 4c90c1d
Merge branch 'main' into b338873783-matrix-factorization
rey-esp 792bd64
Merge branch 'b338873783-matrix-factorization' of github.com:googleap…
rey-esp ed279be
Merge branch 'main' into b338873783-matrix-factorization
rey-esp ba5beb3
preparing test files
rey-esp 86fb956
Merge branch 'main' into b338873783-matrix-factorization
rey-esp a29bbcf
Merge branch 'main' into b338873783-matrix-factorization
rey-esp 8577833
add test data
rey-esp a92007c
Merge branch 'main' into b338873783-matrix-factorization
rey-esp a808429
Merge branch 'main' into b338873783-matrix-factorization
rey-esp 4b7b4db
new error: to_gbq column names need to be changed?
rey-esp 8d55eac
Merge branch 'main' into b338873783-matrix-factorization
rey-esp 9195658
Merge branch 'main' into b338873783-matrix-factorization
rey-esp faa4d6b
Merge branch 'main' into b338873783-matrix-factorization
rey-esp 76a9934
Merge branch 'b338873783-matrix-factorization' of github.com:googleap…
rey-esp bef7808
Delete demo.ipynb
rey-esp f18104d
Merge branch 'main' into b338873783-matrix-factorization
rey-esp 9b39a99
Merge branch 'b338873783-matrix-factorization' of github.com:googleap…
rey-esp 0dd033d
passing system test
rey-esp 60faed1
Merge branch 'main' into b338873783-matrix-factorization
rey-esp 1f85b75
preparing to add unit tests
rey-esp 7efc63d
Merge branch 'main' into b338873783-matrix-factorization
rey-esp a457639
2 out of 3 (so far) passing unit tests
rey-esp 89790ac
Merge branch 'main' into b338873783-matrix-factorization
rey-esp a057a8f
Merge branch 'main' into b338873783-matrix-factorization
rey-esp 512332e
attempted mocking
rey-esp 741e749
Merge branch 'main' into b338873783-matrix-factorization
rey-esp f902131
Merge branch 'main' into b338873783-matrix-factorization
rey-esp 310257d
Merge branch 'b338873783-matrix-factorization' of github.com:googleap…
rey-esp 408e807
fix tests
rey-esp 19e423b
new test file for model creation unit tests
rey-esp 2c107df
Merge branch 'main' into b338873783-matrix-factorization
rey-esp c7c8eea
Merge branch 'main' into b338873783-matrix-factorization
rey-esp 5f1a19a
add unit tests for num_factors, user_col, and item_col
rey-esp 68e308b
Merge branch 'b338873783-matrix-factorization' of github.com:googleap…
rey-esp 33f3069
Update tests/unit/ml/test_matrix_factorization.py
rey-esp 1ff6aaa
Update tests/unit/ml/test_matrix_factorization.py
rey-esp 446712b
Merge branch 'main' into b338873783-matrix-factorization
rey-esp c84dd7e
uncomment one test
rey-esp 3473037
uncomment test
rey-esp b3809e5
uncomment test
rey-esp 7e8a5b6
uncomment test
rey-esp eba88d9
Merge branch 'main' into b338873783-matrix-factorization
rey-esp 8599d88
nearly all tests
rey-esp 8ab8818
tests complete and passing
rey-esp b4d3578
seeing if test causes kokoro failure
rey-esp a63cb90
uncomment test-kokoro still failing
rey-esp 3695f80
Merge branch 'main' into b338873783-matrix-factorization
rey-esp 336bffd
Merge branch 'tswast-patch-1' into b338873783-matrix-factorization
rey-esp bb6130a
Merge branch 'main' into b338873783-matrix-factorization
rey-esp e69438d
remove comment
rey-esp 05da834
Merge branch 'main' into b338873783-matrix-factorization
rey-esp 087953f
fix test
rey-esp bfe9140
Merge branch 'main' into b338873783-matrix-factorization
rey-esp 8d3599e
Merge branch 'main' into b338873783-matrix-factorization
rey-esp 248a3b1
Merge branch 'main' into b338873783-matrix-factorization
rey-esp 157daea
Merge branch 'main' into b338873783-matrix-factorization
rey-esp 8912663
test kokoro
rey-esp 35a8c18
test_decomposition.py failing and now feedback_type attr does not exist
rey-esp ac182be
Merge branch 'main' into b338873783-matrix-factorization
rey-esp ff58ff5
passing tests
rey-esp f0a6ba2
Update bigframes/ml/decomposition.py
rey-esp aaad5f5
Merge branch 'main' into b338873783-matrix-factorization
rey-esp b586c5c
Update tests/system/large/ml/test_decomposition.py
rey-esp 04ddd5e
Merge branch 'main' into b338873783-matrix-factorization
rey-esp 8e875ae
Merge branch 'main' into b338873783-matrix-factorization
rey-esp 565138a
doc attempt - _mf.py example
rey-esp b39661f
Merge branch 'b338873783-matrix-factorization' of github.com:googleap…
rey-esp c0ef08f
feedback_type case ignore
rey-esp 4b53b04
Merge branch 'main' into b338873783-matrix-factorization
rey-esp 342cbd1
Merge branch 'main' into b338873783-matrix-factorization
rey-esp 8812f33
Merge branch 'main' into b338873783-matrix-factorization
rey-esp 24b8e0c
Merge branch 'main' into b338873783-matrix-factorization
rey-esp 664de04
Update _mf.py - remove global_explain()
rey-esp 63e8e9c
fit
rey-esp 3e52cd4
pull?
rey-esp c2e9a5f
W
rey-esp 28c4602
Merge branch 'main' into b338873783-matrix-factorization
rey-esp 1240eeb
Merge branch 'main' into b338873783-matrix-factorization
rey-esp 46f1ea6
Merge branch 'main' into b338873783-matrix-factorization
rey-esp 193b9c8
fix docs (maybe)
rey-esp 5a547f8
Update test_matrix_factorization.py with updated error messages
rey-esp 23d8fc8
ilnt
rey-esp ed99ad7
Update test_matrix_factorization.py - add 'f'
rey-esp e305950
improve errors and update tests
rey-esp 411fe1a
Merge branch 'main' into b338873783-matrix-factorization
rey-esp 4273a99
Merge branch 'main' into b338873783-matrix-factorization
rey-esp b9f6a52
Merge branch 'main' into b338873783-matrix-factorization
rey-esp b92ed1f
Merge branch 'main' into b338873783-matrix-factorization
rey-esp aaf34eb
Merge branch 'main' into b338873783-matrix-factorization
rey-esp 46601c4
Merge branch 'main' into b338873783-matrix-factorization
rey-esp 0823db2
Merge branch 'main' into b338873783-matrix-factorization
rey-esp 32917e5
Update tests/system/large/ml/test_decomposition.py
rey-esp e485d3b
Update bigframes/ml/decomposition.py - num_factors error messsage
rey-esp 6a27083
Update bigframes/ml/decomposition.py - user_col error message
rey-esp 6e2d902
Update bigframes/ml/decomposition.py - rating_col error message
rey-esp b65c637
Update bigframes/ml/decomposition.py - l2_reg error msg
rey-esp 93ac0fa
Merge branch 'main' into b338873783-matrix-factorization
rey-esp 74ebe27
fix tests to match updated error messages
rey-esp b2ebcf7
Merge branch 'b338873783-matrix-factorization' of github.com:googleap…
rey-esp 3f40763
Update third_party/bigframes_vendored/sklearn/decomposition/_mf.py - …
rey-esp 2cbc2e3
Update third_party/bigframes_vendored/sklearn/decomposition/_mf.py - …
rey-esp 0a5aefb
Update third_party/bigframes_vendored/sklearn/decomposition/_mf.py - …
rey-esp d484f77
Merge branch 'main' into b338873783-matrix-factorization
rey-esp 366e0ab
Update third_party/bigframes_vendored/sklearn/decomposition/_mf.py
tswast 1eaa708
Merge branch 'main' into b338873783-matrix-factorization
rey-esp 56ee623
remove errors and tests
rey-esp c942418
Update bigframes/ml/decomposition.py
rey-esp e0ef53e
Update bigframes/ml/decomposition.py
rey-esp 5018182
Update bigframes/ml/decomposition.py
rey-esp c088a76
Merge branch 'main' into b338873783-matrix-factorization
rey-esp f9397f1
passing system test
rey-esp b439120
E AssertionError: expected call not found.
rey-esp ffe0f33
Merge branch 'main' into b338873783-matrix-factorization
rey-esp b2698ef
Merge branch 'main' into b338873783-matrix-factorization
rey-esp 69c8fba
Merge branch 'main' into b338873783-matrix-factorization
rey-esp 8a614c5
same # of elements in each
rey-esp 9d71c86
Merge branch 'main' into b338873783-matrix-factorization
rey-esp c2b4795
attempt
rey-esp cd20ffc
Merge branch 'main' into b338873783-matrix-factorization
rey-esp cf6e5be
doc fix
rey-esp da230b4
doc fix
rey-esp 8927072
Merge branch 'main' into b338873783-matrix-factorization
rey-esp File filter
Filter by extension
Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
There are no files selected for viewing
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
| Original file line number | Diff line number | Diff line change |
|---|---|---|
|
|
@@ -19,6 +19,7 @@ | |
|
|
||
| from typing import List, Literal, Optional, Union | ||
|
|
||
| import bigframes_vendored.sklearn.decomposition._mf | ||
| import bigframes_vendored.sklearn.decomposition._pca | ||
| from google.cloud import bigquery | ||
|
|
||
|
|
@@ -27,7 +28,15 @@ | |
| import bigframes.pandas as bpd | ||
| import bigframes.session | ||
|
|
||
| _BQML_PARAMS_MAPPING = {"svd_solver": "pcaSolver"} | ||
| _BQML_PARAMS_MAPPING = { | ||
| "svd_solver": "pcaSolver", | ||
| "feedback_type": "feedbackType", | ||
| "num_factors": "numFactors", | ||
| "user_col": "userColumn", | ||
| "item_col": "itemColumn", | ||
| "_input_label_columns": "inputLabelColumns", | ||
| "l2_reg": "l2Regularization", | ||
| } | ||
|
|
||
|
|
||
| @log_adapter.class_logger | ||
|
|
@@ -197,3 +206,159 @@ def score( | |
|
|
||
| # TODO(b/291973741): X param is ignored. Update BQML supports input in ML.EVALUATE. | ||
| return self._bqml_model.evaluate() | ||
|
|
||
|
|
||
| @log_adapter.class_logger | ||
| class MatrixFactorization( | ||
| base.UnsupervisedTrainablePredictor, | ||
| bigframes_vendored.sklearn.decomposition._mf.MatrixFactorization, | ||
| ): | ||
| __doc__ = bigframes_vendored.sklearn.decomposition._mf.MatrixFactorization.__doc__ | ||
|
|
||
| def __init__( | ||
| self, | ||
| *, | ||
| feedback_type: Literal["explicit", "implicit"] = "explicit", | ||
| num_factors: int, | ||
|
rey-esp marked this conversation as resolved.
|
||
| user_col: str, | ||
| item_col: str, | ||
| rating_col: str = "rating", | ||
|
Comment on lines
+223
to
+225
Contributor
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. @GarrettWu @shuoweil I see in #1282 you ended up passing in "id_col" as a separate argument to Note: MatrixFactorization differs somewhat from that application in that normally in scikit-learn one would have a "sparse matrix" data type (e.g. https://docs.scipy.org/doc/scipy/reference/sparse.html) where rows/cols/values would all be bundled up in one object, similar to how we are using the bigframes DataFrame for this purpose. |
||
| # TODO: Add support for hyperparameter tuning. | ||
| l2_reg: float = 1.0, | ||
| ): | ||
|
|
||
| feedback_type = feedback_type.lower() # type: ignore | ||
| if feedback_type not in ("explicit", "implicit"): | ||
| raise ValueError("Expected feedback_type to be `explicit` or `implicit`.") | ||
|
|
||
| self.feedback_type = feedback_type | ||
|
|
||
| if not isinstance(num_factors, int): | ||
| raise TypeError( | ||
| f"Expected num_factors to be an int, but got {type(num_factors)}." | ||
| ) | ||
|
|
||
| if num_factors < 0: | ||
| raise ValueError( | ||
| f"Expected num_factors to be a positive integer, but got {num_factors}." | ||
| ) | ||
|
|
||
| self.num_factors = num_factors | ||
|
|
||
| if not isinstance(user_col, str): | ||
| raise TypeError(f"Expected user_col to be a str, but got {type(user_col)}.") | ||
|
|
||
| self.user_col = user_col | ||
|
|
||
| if not isinstance(item_col, str): | ||
| raise TypeError(f"Expected item_col to be STR, but got {type(item_col)}.") | ||
|
|
||
| self.item_col = item_col | ||
|
|
||
| if not isinstance(rating_col, str): | ||
| raise TypeError( | ||
| f"Expected rating_col to be a str, but got {type(rating_col)}." | ||
| ) | ||
|
|
||
| self._input_label_columns = [rating_col] | ||
|
|
||
| if not isinstance(l2_reg, (float, int)): | ||
| raise TypeError( | ||
| f"Expected l2_reg to be a float or int, but got {type(l2_reg)}." | ||
| ) | ||
|
|
||
| self.l2_reg = l2_reg | ||
| self._bqml_model: Optional[core.BqmlModel] = None | ||
| self._bqml_model_factory = globals.bqml_model_factory() | ||
|
rey-esp marked this conversation as resolved.
|
||
|
|
||
| @property | ||
| def rating_col(self) -> str: | ||
| """str: The rating column name. Defaults to 'rating'.""" | ||
| return self._input_label_columns[0] | ||
|
|
||
| @classmethod | ||
| def _from_bq( | ||
| cls, session: bigframes.session.Session, bq_model: bigquery.Model | ||
| ) -> MatrixFactorization: | ||
| assert bq_model.model_type == "MATRIX_FACTORIZATION" | ||
|
|
||
| kwargs = utils.retrieve_params_from_bq_model( | ||
| cls, bq_model, _BQML_PARAMS_MAPPING | ||
| ) | ||
|
|
||
| model = cls(**kwargs) | ||
| model._bqml_model = core.BqmlModel(session, bq_model) | ||
| return model | ||
|
|
||
| @property | ||
| def _bqml_options(self) -> dict: | ||
| """The model options as they will be set for BQML""" | ||
| options: dict = { | ||
| "model_type": "matrix_factorization", | ||
| "feedback_type": self.feedback_type, | ||
| "user_col": self.user_col, | ||
| "item_col": self.item_col, | ||
| "rating_col": self.rating_col, | ||
| "l2_reg": self.l2_reg, | ||
| } | ||
|
|
||
| if self.num_factors is not None: | ||
| options["num_factors"] = self.num_factors | ||
|
|
||
| return options | ||
|
|
||
| def _fit( | ||
| self, | ||
| X: utils.ArrayType, | ||
| y=None, | ||
| transforms: Optional[List[str]] = None, | ||
| ) -> MatrixFactorization: | ||
| if y is not None: | ||
| raise ValueError( | ||
| "Label column not supported for Matrix Factorization model but y was not `None`" | ||
| ) | ||
|
|
||
| (X,) = utils.batch_convert_to_dataframe(X) | ||
|
|
||
| self._bqml_model = self._bqml_model_factory.create_model( | ||
| X_train=X, | ||
| transforms=transforms, | ||
| options=self._bqml_options, | ||
| ) | ||
| return self | ||
|
|
||
| def predict(self, X: utils.ArrayType) -> bpd.DataFrame: | ||
| if not self._bqml_model: | ||
| raise RuntimeError("A model must be fitted before recommend") | ||
|
|
||
| (X,) = utils.batch_convert_to_dataframe(X, session=self._bqml_model.session) | ||
|
|
||
| return self._bqml_model.recommend(X) | ||
|
|
||
| def to_gbq(self, model_name: str, replace: bool = False) -> MatrixFactorization: | ||
| """Save the model to BigQuery. | ||
|
|
||
| Args: | ||
| model_name (str): | ||
| The name of the model. | ||
| replace (bool, default False): | ||
| Determine whether to replace if the model already exists. Default to False. | ||
|
|
||
| Returns: | ||
| MatrixFactorization: Saved model.""" | ||
| if not self._bqml_model: | ||
| raise RuntimeError("A model must be fitted before it can be saved") | ||
|
|
||
| new_model = self._bqml_model.copy(model_name, replace) | ||
| return new_model.session.read_gbq_model(model_name) | ||
|
|
||
| def score( | ||
| self, | ||
| X=None, | ||
| y=None, | ||
| ) -> bpd.DataFrame: | ||
| if not self._bqml_model: | ||
| raise RuntimeError("A model must be fitted before score") | ||
|
|
||
| # TODO(b/291973741): X param is ignored. Update BQML supports input in ML.EVALUATE. | ||
| return self._bqml_model.evaluate() | ||
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1,20 @@ | ||
| {"user_id": 1, "item_id": 2, "rating": 4.0} | ||
| {"user_id": 1, "item_id": 5, "rating": 3.0} | ||
| {"user_id": 2, "item_id": 1, "rating": 5.0} | ||
| {"user_id": 2, "item_id": 3, "rating": 2.0} | ||
| {"user_id": 3, "item_id": 4, "rating": 4.5} | ||
| {"user_id": 3, "item_id": 7, "rating": 3.5} | ||
| {"user_id": 4, "item_id": 2, "rating": 1.0} | ||
| {"user_id": 4, "item_id": 8, "rating": 5.0} | ||
| {"user_id": 5, "item_id": 3, "rating": 4.0} | ||
| {"user_id": 5, "item_id": 9, "rating": 2.5} | ||
| {"user_id": 6, "item_id": 1, "rating": 3.0} | ||
| {"user_id": 6, "item_id": 6, "rating": 4.5} | ||
| {"user_id": 7, "item_id": 5, "rating": 5.0} | ||
| {"user_id": 7, "item_id": 10, "rating": 1.5} | ||
| {"user_id": 8, "item_id": 4, "rating": 2.0} | ||
| {"user_id": 8, "item_id": 7, "rating": 4.0} | ||
| {"user_id": 9, "item_id": 2, "rating": 3.5} | ||
| {"user_id": 9, "item_id": 9, "rating": 5.0} | ||
| {"user_id": 10, "item_id": 3, "rating": 4.5} | ||
| {"user_id": 10, "item_id": 8, "rating": 2.5} |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1,17 @@ | ||
| [ | ||
| { | ||
| "mode": "NULLABLE", | ||
| "name": "user_id", | ||
| "type": "STRING" | ||
| }, | ||
| { | ||
| "mode": "NULLABLE", | ||
| "name": "item_id", | ||
| "type": "INT64" | ||
| }, | ||
| { | ||
| "mode": "NULLABLE", | ||
| "name": "rating", | ||
| "type": "FLOAT" | ||
| } | ||
| ] |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Oops, something went wrong.
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Uh oh!
There was an error while loading. Please reload this page.