Advanced metric transformations

Cuped

Class for data CUPED transformation.

MultiCuped

Class for data Multi CUPED transformation.

MLVarianceReducer

Machine Learning approach for variance reduction.

class ambrosia.preprocessing.Cuped(verbose=True)[source]

Class for data CUPED transformation.

https://towardsdatascience.com/how-to-double-a-b-testing-speed-with-cuped-f80460825a90 Y_hat = Y - theta * X theta := cov(X, Y) / Var(Y) It is important, that the mean covariance metric did not change over time!!!

Parameters:
verbosebool, default: True

If True will print in sys.stdout the information about the variance reduction.

Attributes:
paramsDict

Parameters of instance that will be updated after calling fit() method. Include: - target column name - covariate column name - name of column after the transformation - linear coefficient for CUPED transformation. - bias value for mean equality

verbosebool

Verbose info flag.

fittedbool

Flag if class was fitted.

Methods

get_params_dict()

Returns dictionary with params if fit() method has been previously called.

load_params_dict(params)

Load params from a dictionary.

store_params(store_path)

Store params to json file if fit() method has been previously called.

load_params(load_path)

Load params from a json file.

fit(covariate_column)

Fit model using a specific covariate column.

transform(covariate_column, inplace, name)

Transform target column after a class instance fitting.

fit_transform(covariate_column, inplace, name)

Combination of fit() and transform() methods.

Examples

Suppose we have the dataframe with users info which contains two columns: a “target” columns and a column with metric “income”. Let us can assume, that over time, the average of the “income” values do not change. Then, we can use CUPED transformation based on “income” data to reduce “target” column variation.

>>> cuped_transformer = Cuped(dataframe, 'target', verbose=True)
>>> cuped_transformer.fit_transform(
>>>     dataframe=dataframe
>>>     target_column='target'
>>>     covariate_column='income',
>>>     transformed_name='cuped_target'
>>>     inplace=True,
>>> )

Now in the dataframe a new column “cuped_target” appeared, we can use it to design our experiment and estimate variance reduction. For further CUPED usage in the future experiment, let us store the parameters:

>>> cuped_transformer.store_params('cuped_transform_params.json')

Now we conduct an experiment and want to transform our data to reduce its variation:

>>> cuped_transformation = Cuped()
>>> cuped_transformation.load_params('cuped_transform_params.json')
>>> cuped_transformation.transform(
>>>     dataframe=exp_results,
>>>     inplace=True,
>>> )
fit(dataframe, target_column, covariate_column, transformed_name=None)[source]

Fit to calculate CUPED parameters for target column using given covariate column and data.

Parameters:
dataframepd.DataFrame

Table with data for the calculation of CUPED parameters.

target_columnColumnNameType

Column from the dataframe, for which CUPED transformation will be applied.

covariate_columnColumnNameType

Column which will be used as the covariate in CUPED transformation.

transformed_nameColumnNamesType, optional

Name for the new transformed target column, if is not defined it will be generated automatically.

transform(dataframe, inplace=False)[source]

Make CUPED transformation for the target column.

Could be performed inplace or not.

Parameters:
dataframepd.DataFrame

Table with data for CUPED transformation.

inplacebool, default: False

If is True, then method returns None and sets a new column for the original dataframe. Otherwise return copied dataframe with a new column.

fit_transform(dataframe, target_column, covariate_column, transformed_name=None, inplace=False)[source]

Combination of fit() and transform() methods.

Parameters:
dataframepd.DataFrame

Table with data for fitting and applying CUPED transformation.

target_columnColumnNameType

Column from the dataframe, for which CUPED transformation will be applied.

covariate_columnColumnNameType

Column which will be used as the covariate.

transformed_nameColumnNamesType, optional

Name for the new transformed target column, if is not defined it will be generated automatically.

inplacebool, default: False

If is True, then method returns None and sets a new column for the original dataframe. Otherwise return copied dataframe with a new column.

store_params(store_path)
Parameters:
store_pathPath

Path where parameters will be stored in a json format.

load_params(load_path)
Parameters:
load_pathPath

Path to json file with parameters.

class ambrosia.preprocessing.MultiCuped(verbose=True)[source]

Class for data Multi CUPED transformation.

Y_hat = Y - X theta (Matrix multiplication) theta := argmin Var (Y - X theta) It is important, that the mean covariance metric do not change over time!!!

Parameters:
verbosebool, default: True

If True will print in sys.stdout the information about the variance reduction.

Attributes:
paramsDict

Parameters of instance that will be updated after calling fit() method. Include: - target column name - covariate columns names - name of column after the transformation - linear coefficients for Multi CUPED transformation. - bias value for mean equality

verbosebool

Verbose info flag.

fittedbool

Flag if class was fitted.

Methods

get_params_dict()

Returns dictionary with params if fit() method has been previously called.

load_params_dict(params)

Load params from a dictionary.

store_params(store_path)

Store params to json file if fit() method has been previously called.

load_params(load_path)

Load params from a json file.

fit(covariate_column)

Fit model using covariate columns.

transform(covariate_column, inplace, name)

Transform target column after a class instance fitting.

fit_transform(covariate_column, inplace, name)

Combination of fit() and transform() methods.

Examples

We have dataframe with users info with column ‘target’ and columns ‘income’ and ‘age’. We can assume, that over time, the average of this covariate values does not change. Then, we can use multi cuped transformation to reduce variation.

Suppose we have the dataframe with users info which contains two columns: a “target” columns and columns “income” and “age”. Let us can assume, that over time, the average of the “income” and “age” values do not change. Then, we can use Multi CUPED transformation based on “income” and “age” data in order to reduce “target” column variation.

>>> cuped_transformer = MultiCuped(verbose=True)
>>> cuped_transformer.fit_transform(
>>>     dataframe=dataframe
>>>     target_column='target'
>>>     ['income', 'age'],
>>>     transformed_name='cuped_target'
>>>     inplace=True,
>>> )

Now in the dataframe a new column “cuped_target” appeared, we can use it to design our experiment and estimate variance reduction. For further Multi CUPED usage in the future experiment, let us store the parameters:

>>> cuped_transformer.store_params('cuped_transform_params.json')

Now we conduct an experiment and want to transform our data to reduce its variation:

>>> cuped_transformation = MultiCuped()
>>> cuped_transformation.load_params('cuped_transform_params.json')
>>> cuped_transformation.transform(
>>>     exp_results,
>>>     inplace=True,
>>> )
fit(dataframe, target_column, covariate_columns, transformed_name=None)[source]

Fit to calculate Multi CUPED parameters for target column using selected covariate columns.

Parameters:
dataframepd.DataFrame

Table with data for the calculation of CUPED parameters.

target_columnColumnNameType

Column from the dataframe, for which CUPED transformation will be applied.

covariate_columnsColumnNamesType

Columns which will be used as the covariates in Multi CUPED transformation.

transformed_nameColumnNamesType, optional

Name for the new transformed target column, if is not defined it will be generated automatically.

transform(dataframe, inplace=False)[source]

Make Multi CUPED transformation for the target column.

Could be performed inplace or not.

Parameters:
dataframepd.DataFrame

Table with data for Multi CUPED transformation.

inplacebool, default: False

If is True, then method returns None and sets a new column for the original dataframe. Otherwise return copied dataframe with a new column.

fit_transform(dataframe, target_column, covariate_columns, transformed_name=None, inplace=False)[source]

Combination of fit() and transform() methods.

Parameters:
dataframepd.DataFrame

Table with data for fitting and applying Multi CUPED transformation.

target_columnColumnNameType

Column from the dataframe, for which CUPED transformation will be applied.

covariate_columnColumnNameType

Column which will be used as the covariate.

transformed_nameColumnNamesType, optional

Name for the new transformed target column, if is not defined it will be generated automatically.

inplacebool, default: False

If is True, then method returns None and sets a new column for the original dataframe. Otherwise return copied dataframe with a new column.

store_params(store_path)
Parameters:
store_pathPath

Path where parameters will be stored in a json format.

load_params(load_path)
Parameters:
load_pathPath

Path to json file with parameters.

class ambrosia.preprocessing.MLVarianceReducer(model='boosting', model_params=None, scores=None, verbose=True)[source]

Machine Learning approach for variance reduction.

Building a model M, we can make a transformation: Y_hat = Y - M(X) + MEAN(M(X))

It is important, that that the mean of M(X) do not change over time!!! You can choose models from Gradient boosting or Ridge regression or your own model class, for example sklearn.ensemble.RandomForest, and pass models params to constructor function for a model assembly.

Parameters:
modelstr or model type, default: "boosting"

Model which will be used for the transformations.

model_paramsDict, optional

Dictionary with parameters which will be used in constructor for a model assembly.

scoresDict[str, Callable], optional

Scores which will be used.

verbosebool, default: True

If True will print in sys.stdout the information about the reduction in variance.

Attributes:
modelmodel type

Model which will be used for the transformations.

paramsDict

Parameters of instance that will be updated after calling fit() method. Include: - target column name - covariate columns names - name of column after the transformation - additional train bias equals mean(M(X)).

scoresDict[str, Callable]

Scores which will be used.

verbosebool

Verbose info flag.

fittedbool

Fit status flag.

Methods

get_params_dict()

Returns dict with instance fitted parameters.

load_params_dict()

Load parameters from the dict.

store_params(store_path)

Store fitted params in a json file and pickle model file.

load_params(load_path)

Load params from a json file and pickled model.

fit(**fit_params)

Fit model using a train data.

transform(dataframe, inplace)

Transform target column of a data frame.

fit_transform(dataframe, **fit_params, inplace)

Combination of fit() and transform() methods.

Examples

We have data table with column ‘target’ and columns ‘feature_1’, ‘feature_2’, ‘feature_3’. Let us assume, that means of all these metrics don’t change over the time, it can be age for example. We want to reduce variance using the predictions some of ML model, then we can use this class:

>>> transformer = MLVarianceReducer() # By default CatBoost model will be choosen
>>> transformer.fit_transform(dataframe, 'target', [feature columns], inplace=True, name='new_target')
>>> transformer.store_params('path_ml_params.json')

Now to transform the experimental data we use the following commands:

>>> transformer = MLVarianceReducer()
>>> transformer.load_params('path_ml_params.json')
>>> transformer.transform(exp_data, inplace=True)
store_params(config_store_path, model_store_path)[source]

Store params of model as a json file, available only for CatBoost model.

You can reach model using instance.model and store it by yourself.

Parameters:
store_pathPath

Path where models parameters will be stored in a json format.

load_params(config_load_path, model_load_path)[source]

Load models params from a json file, works only for CatBoost model.

Parameters:
load_path: Path

Path to a json file with model parameters.

fit(dataframe, target_column, covariate_columns, transformed_name=None)[source]

Fit model for transformations.

Parameters:
dataframepd.DataFrame

Table with data for model fitting.

target_columnColumnNameType

Column from the dataframe, for which transformation will be applied.

covariate_columns: ColumnNamesType

Columns which will be used for the transformation.

transformed_nameColumnNamesType, optional

Name for the new transformed target column, if is not defined it will be generated automatically.

transform(dataframe, inplace=False)[source]

Transform data using the fitted model.

Parameters:
dataframepd.DataFrame

Table with data for transformation.

inplacebool, default: False

If is True, then method returns None and sets a new column for the original dataframe. Otherwise return copied dataframe with a new column.

fit_transform(dataframe, target_column, covariate_columns, transformed_name=None, inplace=False)[source]

Combinate consequentially fit() and transform() methods.

Parameters:
dataframepd.DataFrame

Table with data for model fitting and further transformation.

target_columnColumnNameType

Column from the dataframe, for which transformation will be applied.

covariate_columns: ColumnNamesType

Columns which will be used for the transformation.

transformed_nameColumnNamesType, optional

Name for the new transformed target column, if is not defined it will be generated automatically.

inplacebool, default: False

If is True, then method returns None and sets a new column for the original dataframe. Otherwise return copied dataframe with a new column.