Advanced metric transformations¶

`Cuped`	Class for data CUPED transformation.
`MultiCuped`	Class for data Multi CUPED transformation.
`MLVarianceReducer`	Machine Learning approach for variance reduction.

class ambrosia.preprocessing.Cuped(verbose=True)[source]¶

Class for data CUPED transformation.

https://towardsdatascience.com/how-to-double-a-b-testing-speed-with-cuped-f80460825a90 Y_hat = Y - theta * X theta := cov(X, Y) / Var(Y) It is important, that the mean covariance metric did not change over time!!!

Parameters:

verbosebool, default: True: If True will print in sys.stdout the information about the variance reduction.

Attributes:

paramsDict: Parameters of instance that will be updated after calling fit() method. Include: - target column name - covariate column name - name of column after the transformation - linear coefficient for CUPED transformation. - bias value for mean equality
verbosebool: Verbose info flag.
fittedbool: Flag if class was fitted.

Methods

get_params_dict()	Returns dictionary with params if fit() method has been previously called.
load_params_dict(params)	Load params from a dictionary.
store_params(store_path)	Store params to json file if fit() method has been previously called.
load_params(load_path)	Load params from a json file.
fit(covariate_column)	Fit model using a specific covariate column.
transform(covariate_column, inplace, name)	Transform target column after a class instance fitting.
fit_transform(covariate_column, inplace, name)	Combination of fit() and transform() methods.

Examples

Suppose we have the dataframe with users info which contains two columns: a “target” columns and a column with metric “income”. Let us can assume, that over time, the average of the “income” values do not change. Then, we can use CUPED transformation based on “income” data to reduce “target” column variation.

>>> cuped_transformer = Cuped(dataframe, 'target', verbose=True)
>>> cuped_transformer.fit_transform(
>>>     dataframe=dataframe
>>>     target_column='target'
>>>     covariate_column='income',
>>>     transformed_name='cuped_target'
>>>     inplace=True,
>>> )

Now in the dataframe a new column “cuped_target” appeared, we can use it to design our experiment and estimate variance reduction. For further CUPED usage in the future experiment, let us store the parameters:

>>> cuped_transformer.store_params('cuped_transform_params.json')

Now we conduct an experiment and want to transform our data to reduce its variation:

>>> cuped_transformation = Cuped()
>>> cuped_transformation.load_params('cuped_transform_params.json')
>>> cuped_transformation.transform(
>>>     dataframe=exp_results,
>>>     inplace=True,
>>> )

fit(dataframe, target_column, covariate_column, transformed_name=None)[source]¶

Fit to calculate CUPED parameters for target column using given covariate column and data.

Parameters:

dataframepd.DataFrame: Table with data for the calculation of CUPED parameters.
target_columnColumnNameType: Column from the dataframe, for which CUPED transformation will be applied.
covariate_columnColumnNameType: Column which will be used as the covariate in CUPED transformation.
transformed_nameColumnNamesType, optional: Name for the new transformed target column, if is not defined it will be generated automatically.

transform(dataframe, inplace=False)[source]¶

Make CUPED transformation for the target column.

Could be performed inplace or not.

Parameters:

dataframepd.DataFrame: Table with data for CUPED transformation.
inplacebool, default: False: If is True, then method returns None and sets a new column for the original dataframe. Otherwise return copied dataframe with a new column.

fit_transform(dataframe, target_column, covariate_column, transformed_name=None, inplace=False)[source]¶

Combination of fit() and transform() methods.

Parameters:

dataframepd.DataFrame: Table with data for fitting and applying CUPED transformation.
target_columnColumnNameType: Column from the dataframe, for which CUPED transformation will be applied.
covariate_columnColumnNameType: Column which will be used as the covariate.
transformed_nameColumnNamesType, optional: Name for the new transformed target column, if is not defined it will be generated automatically.
inplacebool, default: False: If is True, then method returns None and sets a new column for the original dataframe. Otherwise return copied dataframe with a new column.

store_params(store_path)¶

Parameters:

store_pathPath: Path where parameters will be stored in a json format.

load_params(load_path)¶

Parameters:

load_pathPath: Path to json file with parameters.

class ambrosia.preprocessing.MultiCuped(verbose=True)[source]¶

Class for data Multi CUPED transformation.

Y_hat = Y - X theta (Matrix multiplication) theta := argmin Var (Y - X theta) It is important, that the mean covariance metric do not change over time!!!

Parameters:

verbosebool, default: True: If True will print in sys.stdout the information about the variance reduction.

Attributes:

paramsDict: Parameters of instance that will be updated after calling fit() method. Include: - target column name - covariate columns names - name of column after the transformation - linear coefficients for Multi CUPED transformation. - bias value for mean equality
verbosebool: Verbose info flag.
fittedbool: Flag if class was fitted.

Methods

get_params_dict()	Returns dictionary with params if fit() method has been previously called.
load_params_dict(params)	Load params from a dictionary.
store_params(store_path)	Store params to json file if fit() method has been previously called.
load_params(load_path)	Load params from a json file.
fit(covariate_column)	Fit model using covariate columns.
transform(covariate_column, inplace, name)	Transform target column after a class instance fitting.
fit_transform(covariate_column, inplace, name)	Combination of fit() and transform() methods.

Examples

We have dataframe with users info with column ‘target’ and columns ‘income’ and ‘age’. We can assume, that over time, the average of this covariate values does not change. Then, we can use multi cuped transformation to reduce variation.

Suppose we have the dataframe with users info which contains two columns: a “target” columns and columns “income” and “age”. Let us can assume, that over time, the average of the “income” and “age” values do not change. Then, we can use Multi CUPED transformation based on “income” and “age” data in order to reduce “target” column variation.

>>> cuped_transformer = MultiCuped(verbose=True)
>>> cuped_transformer.fit_transform(
>>>     dataframe=dataframe
>>>     target_column='target'
>>>     ['income', 'age'],
>>>     transformed_name='cuped_target'
>>>     inplace=True,
>>> )

Now in the dataframe a new column “cuped_target” appeared, we can use it to design our experiment and estimate variance reduction. For further Multi CUPED usage in the future experiment, let us store the parameters:

>>> cuped_transformer.store_params('cuped_transform_params.json')

Now we conduct an experiment and want to transform our data to reduce its variation:

>>> cuped_transformation = MultiCuped()
>>> cuped_transformation.load_params('cuped_transform_params.json')
>>> cuped_transformation.transform(
>>>     exp_results,
>>>     inplace=True,
>>> )

fit(dataframe, target_column, covariate_columns, transformed_name=None)[source]¶

Fit to calculate Multi CUPED parameters for target column using selected covariate columns.

Parameters:

dataframepd.DataFrame: Table with data for the calculation of CUPED parameters.
target_columnColumnNameType: Column from the dataframe, for which CUPED transformation will be applied.
covariate_columnsColumnNamesType: Columns which will be used as the covariates in Multi CUPED transformation.
transformed_nameColumnNamesType, optional: Name for the new transformed target column, if is not defined it will be generated automatically.

transform(dataframe, inplace=False)[source]¶

Make Multi CUPED transformation for the target column.

Could be performed inplace or not.

Parameters:

dataframepd.DataFrame: Table with data for Multi CUPED transformation.
inplacebool, default: False: If is True, then method returns None and sets a new column for the original dataframe. Otherwise return copied dataframe with a new column.

fit_transform(dataframe, target_column, covariate_columns, transformed_name=None, inplace=False)[source]¶

Combination of fit() and transform() methods.

Parameters:

dataframepd.DataFrame: Table with data for fitting and applying Multi CUPED transformation.
target_columnColumnNameType: Column from the dataframe, for which CUPED transformation will be applied.
covariate_columnColumnNameType: Column which will be used as the covariate.
transformed_nameColumnNamesType, optional: Name for the new transformed target column, if is not defined it will be generated automatically.
inplacebool, default: False: If is True, then method returns None and sets a new column for the original dataframe. Otherwise return copied dataframe with a new column.

store_params(store_path)¶

Parameters:

store_pathPath: Path where parameters will be stored in a json format.

load_params(load_path)¶

Parameters:

load_pathPath: Path to json file with parameters.

class ambrosia.preprocessing.MLVarianceReducer(model='boosting', model_params=None, scores=None, verbose=True)[source]¶

Machine Learning approach for variance reduction.

Building a model M, we can make a transformation: Y_hat = Y - M(X) + MEAN(M(X))

It is important, that that the mean of M(X) do not change over time!!! You can choose models from Gradient boosting or Ridge regression or your own model class, for example sklearn.ensemble.RandomForest, and pass models params to constructor function for a model assembly.

Parameters:

modelstr or model type, default: "boosting": Model which will be used for the transformations.
model_paramsDict, optional: Dictionary with parameters which will be used in constructor for a model assembly.
scoresDict[str, Callable], optional: Scores which will be used.
verbosebool, default: True: If True will print in sys.stdout the information about the reduction in variance.

Attributes:

modelmodel type: Model which will be used for the transformations.
paramsDict: Parameters of instance that will be updated after calling fit() method. Include: - target column name - covariate columns names - name of column after the transformation - additional train bias equals mean(M(X)).
scoresDict[str, Callable]: Scores which will be used.
verbosebool: Verbose info flag.
fittedbool: Fit status flag.

Methods

get_params_dict()	Returns dict with instance fitted parameters.
load_params_dict()	Load parameters from the dict.
store_params(store_path)	Store fitted params in a json file and pickle model file.
load_params(load_path)	Load params from a json file and pickled model.
fit(fit_params)**	Fit model using a train data.
transform(dataframe, inplace)	Transform target column of a data frame.
fit_transform(dataframe, fit_params, inplace)**	Combination of fit() and transform() methods.

Examples

We have data table with column ‘target’ and columns ‘feature_1’, ‘feature_2’, ‘feature_3’. Let us assume, that means of all these metrics don’t change over the time, it can be age for example. We want to reduce variance using the predictions some of ML model, then we can use this class:

>>> transformer = MLVarianceReducer() # By default CatBoost model will be choosen
>>> transformer.fit_transform(dataframe, 'target', [feature columns], inplace=True, name='new_target')
>>> transformer.store_params('path_ml_params.json')

Now to transform the experimental data we use the following commands:

>>> transformer = MLVarianceReducer()
>>> transformer.load_params('path_ml_params.json')
>>> transformer.transform(exp_data, inplace=True)

store_params(config_store_path, model_store_path)[source]¶

Store params of model as a json file, available only for CatBoost model.

You can reach model using instance.model and store it by yourself.

Parameters:

store_pathPath: Path where models parameters will be stored in a json format.

load_params(config_load_path, model_load_path)[source]¶

Load models params from a json file, works only for CatBoost model.

Parameters:

load_path: Path: Path to a json file with model parameters.

fit(dataframe, target_column, covariate_columns, transformed_name=None)[source]¶

Fit model for transformations.

Parameters:

dataframepd.DataFrame: Table with data for model fitting.
target_columnColumnNameType: Column from the dataframe, for which transformation will be applied.
covariate_columns: ColumnNamesType: Columns which will be used for the transformation.
transformed_nameColumnNamesType, optional: Name for the new transformed target column, if is not defined it will be generated automatically.

transform(dataframe, inplace=False)[source]¶

Transform data using the fitted model.

Parameters:

dataframepd.DataFrame: Table with data for transformation.
inplacebool, default: False: If is True, then method returns None and sets a new column for the original dataframe. Otherwise return copied dataframe with a new column.

fit_transform(dataframe, target_column, covariate_columns, transformed_name=None, inplace=False)[source]¶

Combinate consequentially fit() and transform() methods.

Parameters:

dataframepd.DataFrame: Table with data for model fitting and further transformation.
target_columnColumnNameType: Column from the dataframe, for which transformation will be applied.
covariate_columns: ColumnNamesType: Columns which will be used for the transformation.
transformed_nameColumnNamesType, optional: Name for the new transformed target column, if is not defined it will be generated automatically.
inplacebool, default: False: If is True, then method returns None and sets a new column for the original dataframe. Otherwise return copied dataframe with a new column.