From 600f56f61773fc16ecbaa634dbc013402d4b17c5 Mon Sep 17 00:00:00 2001 From: Francisco Javier Arceo Date: Tue, 5 Nov 2024 22:43:57 -0500 Subject: [PATCH] feat: Adding docs outlining native Python transformations on singletons Signed-off-by: Francisco Javier Arceo --- docs/reference/beta-on-demand-feature-view.md | 176 +++++++++++------- 1 file changed, 105 insertions(+), 71 deletions(-) diff --git a/docs/reference/beta-on-demand-feature-view.md b/docs/reference/beta-on-demand-feature-view.md index 55fe534446e..11bacb4871c 100644 --- a/docs/reference/beta-on-demand-feature-view.md +++ b/docs/reference/beta-on-demand-feature-view.md @@ -1,122 +1,147 @@ -# \[Beta] On demand feature view +# [Beta] On Demand Feature Views -**Warning**: This is an experimental feature. To our knowledge, this is stable, but there are still rough edges in the experience. Contributions are welcome! +**Warning**: This is an experimental feature. While it is stable to our knowledge, there may still be rough edges in the experience. Contributions are welcome! ## Overview -On Demand Feature Views (ODFVs) allow data scientists to use existing features and request-time data (features only -available at request time) to transform and create new features. Users define Python transformation logic which is -executed during both historical retrieval and online retrieval. Additionally, ODFVs provide flexibility in -applying transformations either during data ingestion (at write time) or during feature retrieval (at read time), -controlled via the `write_to_online_store` parameter. +On Demand Feature Views (ODFVs) allow data scientists to use existing features and request-time data to transform and +create new features. Users define transformation logic that is executed during both historical and online retrieval. +Additionally, ODFVs provide flexibility in applying transformations either during data ingestion (at write time) or +during feature retrieval (at read time), controlled via the `write_to_online_store` parameter. By setting `write_to_online_store=True`, transformations are applied during data ingestion, and the transformed features are stored in the online store. This can improve online feature retrieval performance by reducing computation during reads. Conversely, if `write_to_online_store=False` (the default if omitted), transformations are applied during feature retrieval. -### Why use on demand feature views? +### Why Use On Demand Feature Views? -This enables data scientists to easily impact the online feature retrieval path. For example, a data scientist could +ODFVs enable data scientists to easily impact the online feature retrieval path. For example, a data scientist could: -1. Call `get_historical_features` to generate a training dataframe -2. Iterate in notebook on feature engineering in Pandas/Python -3. Copy transformation logic into ODFVs and commit to a development branch of the feature repository -4. Verify with `get_historical_features` (on a small dataset) that the transformation gives expected output over historical data +1. Call `get_historical_features` to generate a training dataset. +2. Iterate in a notebook and do your feature engineering using Pandas or native Python. +3. Copy transformation logic into ODFVs and commit to a development branch of the feature repository. +4. Verify with `get_historical_features` (on a small dataset) that the transformation gives the expected output over historical data. 5. Decide whether to apply the transformation on writes or on reads by setting the `write_to_online_store` parameter accordingly. -6. Verify with `get_online_features` on dev branch that the transformation correctly outputs online features -7. Submit a pull request to the staging / prod branches which impact production traffic +6. Verify with `get_online_features` on the development branch that the transformation correctly outputs online features. +7. Submit a pull request to the staging or production branches, impacting production traffic. -## CLI +## Transformation Modes -There are new CLI commands: +When defining an ODFV, you can specify the transformation mode using the `mode` parameter. Feast supports the following modes: -* `feast on-demand-feature-views list` lists all registered on demand feature view after `feast apply` is run -* `feast on-demand-feature-views describe [NAME]` describes the definition of an on demand feature view +- **Pandas Mode (`mode="pandas"`)**: The transformation function takes a Pandas DataFrame as input and returns a Pandas DataFrame as output. This mode is useful for batch transformations over multiple rows. +- **Native Python Mode (`mode="python"`)**: The transformation function uses native Python and can operate on inputs as lists of values or as single dictionaries representing a singleton (single row). -## Example +### Singleton Transformations in Native Python Mode + +Native Python mode supports transformations on singleton dictionaries by setting `singleton=True`. This allows you to +write transformation functions that operate on a single row at a time, making the code more intuitive and aligning with +how data scientists typically think about data transformations. +## Example See [https://github.com/feast-dev/on-demand-feature-views-demo](https://github.com/feast-dev/on-demand-feature-views-demo) for an example on how to use on demand feature views. -### **Registering transformations** -On Demand Transformations support transformations using Pandas and native Python. Note, Native Python is much faster -but not yet tested for offline retrieval. +## Registering Transformations -When defining an ODFV, you can control when the transformation is applied using the write_to_online_store parameter: +When defining an ODFV, you can control when the transformation is applied using the `write_to_online_store` parameter: - `write_to_online_store=True`: The transformation is applied during data ingestion (on write), and the transformed features are stored in the online store. -- `write_to_online_store=False` (default when omitted): The transformation is applied during feature retrieval (on read). +- `write_to_online_store=False` (default): The transformation is applied during feature retrieval (on read). -We register `RequestSource` inputs and the transform in `on_demand_feature_view`: +### Examples -## Example of an On Demand Transformation on Read +#### Example 1: On Demand Transformation on Read Using Pandas Mode ```python -from feast import Field, RequestSource +from feast import Field, RequestSource, on_demand_feature_view from feast.types import Float64, Int64 -from typing import Any, Dict import pandas as pd -# Define a request data source which encodes features / information only -# available at request time (e.g. part of the user initiated HTTP request) +# Define a request data source for request-time features input_request = RequestSource( name="vals_to_add", schema=[ - Field(name='val_to_add', dtype=Int64), - Field(name='val_to_add_2', dtype=Int64) - ] + Field(name="val_to_add", dtype=Int64), + Field(name="val_to_add_2", dtype=Int64), + ], ) -# Use the input data and feature view features to create new features Pandas mode +# Use input data and feature view features to create new features in Pandas mode @on_demand_feature_view( - sources=[ - driver_hourly_stats_view, - input_request - ], - schema=[ - Field(name='conv_rate_plus_val1', dtype=Float64), - Field(name='conv_rate_plus_val2', dtype=Float64) - ], - mode="pandas", + sources=[driver_hourly_stats_view, input_request], + schema=[ + Field(name="conv_rate_plus_val1", dtype=Float64), + Field(name="conv_rate_plus_val2", dtype=Float64), + ], + mode="pandas", ) def transformed_conv_rate(features_df: pd.DataFrame) -> pd.DataFrame: df = pd.DataFrame() - df['conv_rate_plus_val1'] = (features_df['conv_rate'] + features_df['val_to_add']) - df['conv_rate_plus_val2'] = (features_df['conv_rate'] + features_df['val_to_add_2']) + df["conv_rate_plus_val1"] = features_df["conv_rate"] + features_df["val_to_add"] + df["conv_rate_plus_val2"] = features_df["conv_rate"] + features_df["val_to_add_2"] return df +``` + +#### Example 2: On Demand Transformation on Read Using Native Python Mode (List Inputs) + +```python +from feast import Field, on_demand_feature_view +from feast.types import Float64 +from typing import Any, Dict -# Use the input data and feature view features to create new features Python mode +# Use input data and feature view features to create new features in Native Python mode @on_demand_feature_view( - sources=[ - driver_hourly_stats_view, - input_request - ], + sources=[driver_hourly_stats_view, input_request], schema=[ - Field(name='conv_rate_plus_val1_python', dtype=Float64), - Field(name='conv_rate_plus_val2_python', dtype=Float64), + Field(name="conv_rate_plus_val1_python", dtype=Float64), + Field(name="conv_rate_plus_val2_python", dtype=Float64), ], mode="python", ) def transformed_conv_rate_python(inputs: Dict[str, Any]) -> Dict[str, Any]: - output: Dict[str, Any] = { + output = { "conv_rate_plus_val1_python": [ conv_rate + val_to_add - for conv_rate, val_to_add in zip( - inputs["conv_rate"], inputs["val_to_add"] - ) + for conv_rate, val_to_add in zip(inputs["conv_rate"], inputs["val_to_add"]) ], "conv_rate_plus_val2_python": [ conv_rate + val_to_add for conv_rate, val_to_add in zip( inputs["conv_rate"], inputs["val_to_add_2"] ) - ] + ], + } + return output +``` + +#### **New** Example 3: On Demand Transformation on Read Using Native Python Mode (Singleton Input) + +```python +from feast import Field, on_demand_feature_view +from feast.types import Float64 +from typing import Any, Dict + +# Use input data and feature view features to create new features in Native Python mode with singleton input +@on_demand_feature_view( + sources=[driver_hourly_stats_view, input_request], + schema=[ + Field(name="conv_rate_plus_acc_singleton", dtype=Float64), + ], + mode="python", + singleton=True, +) +def transformed_conv_rate_singleton(inputs: Dict[str, Any]) -> Dict[str, Any]: + output = { + "conv_rate_plus_acc_singleton": inputs["conv_rate"] + inputs["acc_rate"] } return output ``` -## Example of an On Demand Transformation on Write +In this example, `inputs` is a dictionary representing a single row, and the transformation function returns a dictionary of transformed features for that single row. This approach is more intuitive and aligns with how data scientists typically process single data records. + +#### Example 4: On Demand Transformation on Write Using Pandas Mode ```python from feast import Field, on_demand_feature_view @@ -126,22 +151,22 @@ import pandas as pd # Existing Feature View driver_hourly_stats_view = ... -# Define an ODFV without RequestSource +# Define an ODFV applying transformation during write time @on_demand_feature_view( sources=[driver_hourly_stats_view], schema=[ - Field(name='conv_rate_adjusted', dtype=Float64), + Field(name="conv_rate_adjusted", dtype=Float64), ], mode="pandas", write_to_online_store=True, # Apply transformation during write time ) def transformed_conv_rate(features_df: pd.DataFrame) -> pd.DataFrame: df = pd.DataFrame() - df['conv_rate_adjusted'] = features_df['conv_rate'] * 1.1 # Adjust conv_rate by 10% + df["conv_rate_adjusted"] = features_df["conv_rate"] * 1.1 # Adjust conv_rate by 10% return df ``` -Then to ingest the data with the new feature view make sure to include all of the input features required for the -transformations: + +To ingest data with the new feature view, include all input features required for the transformations: ```python from feast import FeatureStore @@ -160,17 +185,17 @@ data = pd.DataFrame({ # Ingest data to the online store store.push("driver_hourly_stats_view", data) -``` +``` -### **Feature retrieval** +### Feature Retrieval {% hint style="info" %} -The on demand feature view's name is the function name (i.e. `transformed_conv_rate`). +**Note**: The name of the on demand feature view is the function name (e.g., `transformed_conv_rate`). {% endhint %} - #### Offline Features -And then to retrieve historical, we can call this in a feature service or reference individual features: + +Retrieve historical features by referencing individual features or using a feature service: ```python training_df = store.get_historical_features( @@ -181,14 +206,14 @@ training_df = store.get_historical_features( "driver_hourly_stats:avg_daily_trips", "transformed_conv_rate:conv_rate_plus_val1", "transformed_conv_rate:conv_rate_plus_val2", + "transformed_conv_rate_singleton:conv_rate_plus_acc_singleton", ], ).to_df() - ``` #### Online Features -And then to retrieve online, we can call this in a feature service or reference individual features: +Retrieve online features by referencing individual features or using a feature service: ```python entity_rows = [ @@ -206,6 +231,15 @@ online_response = store.get_online_features( "driver_hourly_stats:acc_rate", "transformed_conv_rate_python:conv_rate_plus_val1_python", "transformed_conv_rate_python:conv_rate_plus_val2_python", + "transformed_conv_rate_singleton:conv_rate_plus_acc_singleton", ], ).to_dict() ``` + +## CLI Commands +There are new CLI commands to manage on demand feature views: + +feast on-demand-feature-views list: Lists all registered on demand feature views after feast apply is run. +feast on-demand-feature-views describe [NAME]: Describes the definition of an on demand feature view. + +