Feature Request
Supposed I used bigframes and processed my data like this:
res_df = df.assign(pred=input_remote_function.apply(generate_text))
And got this output dataframe res_df:
| uri |
pred |
| gs://dataproc-metastore-public-binaries/ads_ba... |
{"interpretation": " The primary message of th... |
| gs://dataproc-metastore-public-binaries/ads_ba... |
{"interpretation": " The primary message of th... |
| gs://dataproc-metastore-public-binaries/ads_ba... |
{"interpretation": " The primary message of th... |
As you can see, the pred column is a JSON string.
Describe the solution you'd like
I would like to use BigQuery JSON Functions to explode this column into a column for each JSON attribute.
For example, I could write a custom BigQuery SQL code to run for this dataframe, like this:
res_df = df.sqldf("SELECT *, JSON_EXTRACT(pred, '$.interpretation') AS interpretation FROM df")
Getting as a result the res_df:
| uri |
pred |
interpretation |
| gs://datap... |
{"interpretation": " The primary message of th... |
The primary messag... |
| gs://datap... |
{"interpretation": " The primary message of th... |
The primary messag... |
| gs://datap... |
{"interpretation": " The primary message of th... |
The primary messag... |
Describe alternatives you've considered
As an alternative I need to use remote functions, like this one, to explode the column:
@bpd.remote_function(
[str],
str,
bigquery_connection=CONNECTION_ID
)
def extract_json_attribute(input_content: str) -> str:
pred = input_content.split("|||")[0]
json_attribute = input_content.split("|||")[1]
import json
return json.loads(pred)[json_attribute]
input_remote_function_int = res_df['pred'] + '|||interpretation'
df = res_df.assign(interpretation=input_remote_function_int.apply(extract_json_attribute))
Which I cannot pass multiple parameter because of #646
Additional context
The pandasql allow us to query pandas DataFrames using SQL syntax.
Bigframes supporting, pushing down the query to BQ would be helpful.
https://pypi.org/project/pandasql/
Feature Request
Supposed I used bigframes and processed my data like this:
res_df = df.assign(pred=input_remote_function.apply(generate_text))And got this output dataframe res_df:
As you can see, the pred column is a JSON string.
Describe the solution you'd like
I would like to use BigQuery JSON Functions to explode this column into a column for each JSON attribute.
For example, I could write a custom BigQuery SQL code to run for this dataframe, like this:
res_df = df.sqldf("SELECT *, JSON_EXTRACT(pred, '$.interpretation') AS interpretation FROM df")Getting as a result the res_df:
Describe alternatives you've considered
As an alternative I need to use remote functions, like this one, to explode the column:
Which I cannot pass multiple parameter because of #646
Additional context
The pandasql allow us to query pandas DataFrames using SQL syntax.
Bigframes supporting, pushing down the query to BQ would be helpful.
https://pypi.org/project/pandasql/