Ambrosia advanced metric transformation tools overview

This notebook contains examples of using classes Cuped, MultiCuped and MLVarianceReducer designed to reduce the variance of target metrics. Synthetically generated data is used for this purpose. This data is artificial, so everything turned out very well.

There will be no theoretical aspects and details of these techniques, they will be added later. Use this notebook as API tutorial only.

[2]:
import pandas as pd
from ambrosia.preprocessing import Cuped, MultiCuped, MLVarianceReducer

Load data

[3]:
data = pd.read_csv('../tests/test_data/var_table.csv')
[4]:
data.head()
[4]:
feature_1 feature_2 feature_3 target
0 -2.426916 5.575498 43.505323 187.385459
1 -2.745189 7.995822 19.942889 99.691566
2 2.437555 17.254237 33.091612 188.880782
3 6.202871 28.913551 25.026746 199.532560
4 3.099725 3.771417 26.403917 121.956238
[5]:
target_column = 'target'

CUPED

[6]:
cuped = Cuped()

Fit and transform

[7]:
cuped.fit_transform(
    dataframe=data,
    target_column=target_column,
    covariate_column='feature_2',
    transformed_name='target_cuped',
    inplace=True,
)
ambrosia LOGGER: After transformation СUPED for target, the variance is 67.0818 % of the original
ambrosia LOGGER: Variance transformation 2982.4627 ===> 2000.6892
[7]:
feature_1 feature_2 feature_3 target target_cuped
0 -2.426916 5.575498 43.505323 187.385459 204.513107
1 -2.745189 7.995822 19.942889 99.691566 109.350175
2 2.437555 17.254237 33.091612 188.880782 169.968233
3 6.202871 28.913551 25.026746 199.532560 144.639755
4 3.099725 3.771417 26.403917 121.956238 144.651222
... ... ... ... ... ...
2995 1.277060 22.630330 36.479685 216.416345 180.913351
2996 5.124652 58.120888 13.836445 239.307014 94.281340
2997 -0.654616 3.930848 32.036205 139.957720 162.160705
2998 0.401016 29.254561 38.268808 240.608496 184.663346
2999 0.488993 4.792474 26.568754 121.064233 140.608270

3000 rows × 5 columns

Store fitted params

[8]:
store_path_cuped = '_examples_configs/cuped_config.json'
[9]:
cuped.get_params_dict()
[9]:
{'target_column': 'target',
 'transformed_name': 'target_cuped',
 'covariate_column': 'feature_2',
 'theta': 3.085966714908545,
 'bias': 11.125671107545354}
[10]:
cuped.store_params(store_path_cuped)

Load params

[11]:
new_cuped = Cuped()
new_cuped.load_params(store_path_cuped)
[12]:
new_cuped.get_params_dict()
[12]:
{'target_column': 'target',
 'transformed_name': 'target_cuped',
 'covariate_column': 'feature_2',
 'theta': 3.085966714908545,
 'bias': 11.125671107545354}

MultiCuped

[13]:
multicuped = MultiCuped()

Fit and transform

[14]:
multicuped.fit_transform(dataframe=data,
                         target_column=target_column,
                         covariate_columns=['feature_2', 'feature_3'],
                         transformed_name='target_multicuped',
                         inplace=True)
ambrosia LOGGER: After transformation Multi СUPED for target, the variance is 1.2779 % of the original
ambrosia LOGGER: Variance transformation 2982.4627 ===> 38.1133
[14]:
feature_1 feature_2 feature_3 target target_cuped target_multicuped
0 -2.426916 5.575498 43.505323 187.385459 204.513107 141.715314
1 -2.745189 7.995822 19.942889 99.691566 109.350175 140.948473
2 2.437555 17.254237 33.091612 188.880782 169.968233 149.436534
3 6.202871 28.913551 25.026746 199.532560 144.639755 156.975607
4 3.099725 3.771417 26.403917 121.956238 144.651222 150.181834
... ... ... ... ... ... ...
2995 1.277060 22.630330 36.479685 216.416345 180.913351 147.103213
2996 5.124652 58.120888 13.836445 239.307014 94.281340 152.893408
2997 -0.654616 3.930848 32.036205 139.957720 162.160705 145.165200
2998 0.401016 29.254561 38.268808 240.608496 184.663346 144.036343
2999 0.488993 4.792474 26.568754 121.064233 140.608270 145.531984

3000 rows × 6 columns

Store fitted params

[15]:
store_path_multicuped = '_examples_configs/multicuped_config.json'
[16]:
multicuped.get_params_dict()
[16]:
{'target_column': 'target',
 'transformed_name': 'target_multicuped',
 'covariate_columns': ['feature_2', 'feature_3'],
 'theta': [[3.034447972098987], [4.000919354909565]],
 'bias': 145.30970530527566}
[17]:
multicuped.store_params(store_path_multicuped)

Load params

[18]:
new_multicuped = MultiCuped()
new_multicuped.load_params(store_path_multicuped)
[19]:
new_multicuped.get_params_dict()
[19]:
{'target_column': 'target',
 'transformed_name': 'target_multicuped',
 'covariate_columns': ['feature_2', 'feature_3'],
 'theta': [[3.034447972098987], [4.000919354909565]],
 'bias': 145.30970530527566}

ML Variance Reduction

[20]:
mltransformer = MLVarianceReducer()

Fit and transform

[21]:
mltransformer.fit_transform(dataframe=data,
                            target_column=target_column,
                            covariate_columns=['feature_2', 'feature_3'],
                            transformed_name='target_mlreducer',
                            inplace=True)
ambrosia LOGGER: After transformation ML approach reduce for target, the variance is 0.9774 % of the original
ambrosia LOGGER: Variance transformation 2982.4627 ===> 29.1504
ambrosia LOGGER: Prediction MSE score - 2945.29041
[21]:
feature_1 feature_2 feature_3 target target_cuped target_multicuped target_mlreducer
0 -2.426916 5.575498 43.505323 187.385459 204.513107 141.715314 144.540545
1 -2.745189 7.995822 19.942889 99.691566 109.350175 140.948473 141.665906
2 2.437555 17.254237 33.091612 188.880782 169.968233 149.436534 149.703421
3 6.202871 28.913551 25.026746 199.532560 144.639755 156.975607 153.785873
4 3.099725 3.771417 26.403917 121.956238 144.651222 150.181834 150.223084
... ... ... ... ... ... ... ...
2995 1.277060 22.630330 36.479685 216.416345 180.913351 147.103213 148.719639
2996 5.124652 58.120888 13.836445 239.307014 94.281340 152.893408 151.970972
2997 -0.654616 3.930848 32.036205 139.957720 162.160705 145.165200 146.166828
2998 0.401016 29.254561 38.268808 240.608496 184.663346 144.036343 141.938850
2999 0.488993 4.792474 26.568754 121.064233 140.608270 145.531984 144.528026

3000 rows × 7 columns

Note: Be careful about overfitting and failing method(s) prerequisites

Final variance of the target metric

[22]:
data[['target', 'target_cuped', 'target_multicuped', 'target_mlreducer']].var()
[22]:
target               2983.457229
target_cuped         2001.356367
target_multicuped      38.126050
target_mlreducer       29.160151
dtype: float64

We can observe different variance reduction of the target metric


Learn more

To get the information on advanced metric transformation techniques see the following resources: