Advanced metric transformations¶
Class for data CUPED transformation. |
|
Class for data Multi CUPED transformation. |
|
Machine Learning approach for variance reduction. |
- class ambrosia.preprocessing.Cuped(verbose=True)[source]¶
Class for data CUPED transformation.
https://towardsdatascience.com/how-to-double-a-b-testing-speed-with-cuped-f80460825a90 Y_hat = Y - theta * X theta := cov(X, Y) / Var(Y) It is important, that the mean covariance metric did not change over time!!!
- Parameters:
- verbosebool, default:
True If
Truewill print in sys.stdout the information about the variance reduction.
- verbosebool, default:
- Attributes:
- paramsDict
Parameters of instance that will be updated after calling fit() method. Include: - target column name - covariate column name - name of column after the transformation - linear coefficient for CUPED transformation. - bias value for mean equality
- verbosebool
Verbose info flag.
- fittedbool
Flag if class was fitted.
Methods
get_params_dict()
Returns dictionary with params if fit() method has been previously called.
load_params_dict(params)
Load params from a dictionary.
store_params(store_path)
Store params to json file if fit() method has been previously called.
load_params(load_path)
Load params from a json file.
fit(covariate_column)
Fit model using a specific covariate column.
transform(covariate_column, inplace, name)
Transform target column after a class instance fitting.
fit_transform(covariate_column, inplace, name)
Combination of fit() and transform() methods.
Examples
Suppose we have the dataframe with users info which contains two columns: a “target” columns and a column with metric “income”. Let us can assume, that over time, the average of the “income” values do not change. Then, we can use CUPED transformation based on “income” data to reduce “target” column variation.
>>> cuped_transformer = Cuped(dataframe, 'target', verbose=True) >>> cuped_transformer.fit_transform( >>> dataframe=dataframe >>> target_column='target' >>> covariate_column='income', >>> transformed_name='cuped_target' >>> inplace=True, >>> )
Now in the dataframe a new column “cuped_target” appeared, we can use it to design our experiment and estimate variance reduction. For further CUPED usage in the future experiment, let us store the parameters:
>>> cuped_transformer.store_params('cuped_transform_params.json')
Now we conduct an experiment and want to transform our data to reduce its variation:
>>> cuped_transformation = Cuped() >>> cuped_transformation.load_params('cuped_transform_params.json') >>> cuped_transformation.transform( >>> dataframe=exp_results, >>> inplace=True, >>> )
- fit(dataframe, target_column, covariate_column, transformed_name=None)[source]¶
Fit to calculate CUPED parameters for target column using given covariate column and data.
- Parameters:
- dataframepd.DataFrame
Table with data for the calculation of CUPED parameters.
- target_columnColumnNameType
Column from the dataframe, for which CUPED transformation will be applied.
- covariate_columnColumnNameType
Column which will be used as the covariate in CUPED transformation.
- transformed_nameColumnNamesType, optional
Name for the new transformed target column, if is not defined it will be generated automatically.
- transform(dataframe, inplace=False)[source]¶
Make CUPED transformation for the target column.
Could be performed inplace or not.
- Parameters:
- dataframepd.DataFrame
Table with data for CUPED transformation.
- inplacebool, default:
False If is
True, then method returnsNoneand sets a new column for the original dataframe. Otherwise return copied dataframe with a new column.
- fit_transform(dataframe, target_column, covariate_column, transformed_name=None, inplace=False)[source]¶
Combination of fit() and transform() methods.
- Parameters:
- dataframepd.DataFrame
Table with data for fitting and applying CUPED transformation.
- target_columnColumnNameType
Column from the dataframe, for which CUPED transformation will be applied.
- covariate_columnColumnNameType
Column which will be used as the covariate.
- transformed_nameColumnNamesType, optional
Name for the new transformed target column, if is not defined it will be generated automatically.
- inplacebool, default:
False If is
True, then method returnsNoneand sets a new column for the original dataframe. Otherwise return copied dataframe with a new column.
- store_params(store_path)¶
- Parameters:
- store_pathPath
Path where parameters will be stored in a json format.
- load_params(load_path)¶
- Parameters:
- load_pathPath
Path to json file with parameters.
- class ambrosia.preprocessing.MultiCuped(verbose=True)[source]¶
Class for data Multi CUPED transformation.
Y_hat = Y - X theta (Matrix multiplication) theta := argmin Var (Y - X theta) It is important, that the mean covariance metric do not change over time!!!
- Parameters:
- verbosebool, default:
True If
Truewill print in sys.stdout the information about the variance reduction.
- verbosebool, default:
- Attributes:
- paramsDict
Parameters of instance that will be updated after calling fit() method. Include: - target column name - covariate columns names - name of column after the transformation - linear coefficients for Multi CUPED transformation. - bias value for mean equality
- verbosebool
Verbose info flag.
- fittedbool
Flag if class was fitted.
Methods
get_params_dict()
Returns dictionary with params if fit() method has been previously called.
load_params_dict(params)
Load params from a dictionary.
store_params(store_path)
Store params to json file if fit() method has been previously called.
load_params(load_path)
Load params from a json file.
fit(covariate_column)
Fit model using covariate columns.
transform(covariate_column, inplace, name)
Transform target column after a class instance fitting.
fit_transform(covariate_column, inplace, name)
Combination of fit() and transform() methods.
Examples
We have dataframe with users info with column ‘target’ and columns ‘income’ and ‘age’. We can assume, that over time, the average of this covariate values does not change. Then, we can use multi cuped transformation to reduce variation.
Suppose we have the dataframe with users info which contains two columns: a “target” columns and columns “income” and “age”. Let us can assume, that over time, the average of the “income” and “age” values do not change. Then, we can use Multi CUPED transformation based on “income” and “age” data in order to reduce “target” column variation.
>>> cuped_transformer = MultiCuped(verbose=True) >>> cuped_transformer.fit_transform( >>> dataframe=dataframe >>> target_column='target' >>> ['income', 'age'], >>> transformed_name='cuped_target' >>> inplace=True, >>> )
Now in the dataframe a new column “cuped_target” appeared, we can use it to design our experiment and estimate variance reduction. For further Multi CUPED usage in the future experiment, let us store the parameters:
>>> cuped_transformer.store_params('cuped_transform_params.json')
Now we conduct an experiment and want to transform our data to reduce its variation:
>>> cuped_transformation = MultiCuped() >>> cuped_transformation.load_params('cuped_transform_params.json') >>> cuped_transformation.transform( >>> exp_results, >>> inplace=True, >>> )
- fit(dataframe, target_column, covariate_columns, transformed_name=None)[source]¶
Fit to calculate Multi CUPED parameters for target column using selected covariate columns.
- Parameters:
- dataframepd.DataFrame
Table with data for the calculation of CUPED parameters.
- target_columnColumnNameType
Column from the dataframe, for which CUPED transformation will be applied.
- covariate_columnsColumnNamesType
Columns which will be used as the covariates in Multi CUPED transformation.
- transformed_nameColumnNamesType, optional
Name for the new transformed target column, if is not defined it will be generated automatically.
- transform(dataframe, inplace=False)[source]¶
Make Multi CUPED transformation for the target column.
Could be performed inplace or not.
- Parameters:
- dataframepd.DataFrame
Table with data for Multi CUPED transformation.
- inplacebool, default:
False If is
True, then method returnsNoneand sets a new column for the original dataframe. Otherwise return copied dataframe with a new column.
- fit_transform(dataframe, target_column, covariate_columns, transformed_name=None, inplace=False)[source]¶
Combination of fit() and transform() methods.
- Parameters:
- dataframepd.DataFrame
Table with data for fitting and applying Multi CUPED transformation.
- target_columnColumnNameType
Column from the dataframe, for which CUPED transformation will be applied.
- covariate_columnColumnNameType
Column which will be used as the covariate.
- transformed_nameColumnNamesType, optional
Name for the new transformed target column, if is not defined it will be generated automatically.
- inplacebool, default:
False If is
True, then method returnsNoneand sets a new column for the original dataframe. Otherwise return copied dataframe with a new column.
- store_params(store_path)¶
- Parameters:
- store_pathPath
Path where parameters will be stored in a json format.
- load_params(load_path)¶
- Parameters:
- load_pathPath
Path to json file with parameters.
- class ambrosia.preprocessing.MLVarianceReducer(model='boosting', model_params=None, scores=None, verbose=True)[source]¶
Machine Learning approach for variance reduction.
Building a model M, we can make a transformation: Y_hat = Y - M(X) + MEAN(M(X))
It is important, that that the mean of M(X) do not change over time!!! You can choose models from Gradient boosting or Ridge regression or your own model class, for example
sklearn.ensemble.RandomForest, and pass models params to constructor function for a model assembly.- Parameters:
- modelstr or model type, default:
"boosting" Model which will be used for the transformations.
- model_paramsDict, optional
Dictionary with parameters which will be used in constructor for a model assembly.
- scoresDict[str, Callable], optional
Scores which will be used.
- verbosebool, default:
True If
Truewill print in sys.stdout the information about the reduction in variance.
- modelstr or model type, default:
- Attributes:
- modelmodel type
Model which will be used for the transformations.
- paramsDict
Parameters of instance that will be updated after calling fit() method. Include: - target column name - covariate columns names - name of column after the transformation - additional train bias equals mean(M(X)).
- scoresDict[str, Callable]
Scores which will be used.
- verbosebool
Verbose info flag.
- fittedbool
Fit status flag.
Methods
get_params_dict()
Returns dict with instance fitted parameters.
load_params_dict()
Load parameters from the dict.
store_params(store_path)
Store fitted params in a json file and pickle model file.
load_params(load_path)
Load params from a json file and pickled model.
fit(**fit_params)
Fit model using a train data.
transform(dataframe, inplace)
Transform target column of a data frame.
fit_transform(dataframe, **fit_params, inplace)
Combination of fit() and transform() methods.
Examples
We have data table with column ‘target’ and columns ‘feature_1’, ‘feature_2’, ‘feature_3’. Let us assume, that means of all these metrics don’t change over the time, it can be age for example. We want to reduce variance using the predictions some of ML model, then we can use this class:
>>> transformer = MLVarianceReducer() # By default CatBoost model will be choosen >>> transformer.fit_transform(dataframe, 'target', [feature columns], inplace=True, name='new_target') >>> transformer.store_params('path_ml_params.json')
Now to transform the experimental data we use the following commands:
>>> transformer = MLVarianceReducer() >>> transformer.load_params('path_ml_params.json') >>> transformer.transform(exp_data, inplace=True)
- store_params(config_store_path, model_store_path)[source]¶
Store params of model as a json file, available only for CatBoost model.
You can reach model using instance.model and store it by yourself.
- Parameters:
- store_pathPath
Path where models parameters will be stored in a json format.
- load_params(config_load_path, model_load_path)[source]¶
Load models params from a json file, works only for CatBoost model.
- Parameters:
- load_path: Path
Path to a json file with model parameters.
- fit(dataframe, target_column, covariate_columns, transformed_name=None)[source]¶
Fit model for transformations.
- Parameters:
- dataframepd.DataFrame
Table with data for model fitting.
- target_columnColumnNameType
Column from the dataframe, for which transformation will be applied.
- covariate_columns: ColumnNamesType
Columns which will be used for the transformation.
- transformed_nameColumnNamesType, optional
Name for the new transformed target column, if is not defined it will be generated automatically.
- transform(dataframe, inplace=False)[source]¶
Transform data using the fitted model.
- Parameters:
- dataframepd.DataFrame
Table with data for transformation.
- inplacebool, default:
False If is
True, then method returnsNoneand sets a new column for the original dataframe. Otherwise return copied dataframe with a new column.
- fit_transform(dataframe, target_column, covariate_columns, transformed_name=None, inplace=False)[source]¶
Combinate consequentially
fit()andtransform()methods.- Parameters:
- dataframepd.DataFrame
Table with data for model fitting and further transformation.
- target_columnColumnNameType
Column from the dataframe, for which transformation will be applied.
- covariate_columns: ColumnNamesType
Columns which will be used for the transformation.
- transformed_nameColumnNamesType, optional
Name for the new transformed target column, if is not defined it will be generated automatically.
- inplacebool, default:
False If is
True, then method returnsNoneand sets a new column for the original dataframe. Otherwise return copied dataframe with a new column.