Ambrosia advanced metric transformation tools overview¶
This notebook contains examples of using classes Cuped, MultiCuped and MLVarianceReducer designed to reduce the variance of target metrics. Synthetically generated data is used for this purpose. This data is artificial, so everything turned out very well.
There will be no theoretical aspects and details of these techniques, they will be added later. Use this notebook as API tutorial only.
[2]:
import pandas as pd
from ambrosia.preprocessing import Cuped, MultiCuped, MLVarianceReducer
Load data
[3]:
data = pd.read_csv('../tests/test_data/var_table.csv')
[4]:
data.head()
[4]:
| feature_1 | feature_2 | feature_3 | target | |
|---|---|---|---|---|
| 0 | -2.426916 | 5.575498 | 43.505323 | 187.385459 |
| 1 | -2.745189 | 7.995822 | 19.942889 | 99.691566 |
| 2 | 2.437555 | 17.254237 | 33.091612 | 188.880782 |
| 3 | 6.202871 | 28.913551 | 25.026746 | 199.532560 |
| 4 | 3.099725 | 3.771417 | 26.403917 | 121.956238 |
[5]:
target_column = 'target'
CUPED¶
[6]:
cuped = Cuped()
Fit and transform
[7]:
cuped.fit_transform(
dataframe=data,
target_column=target_column,
covariate_column='feature_2',
transformed_name='target_cuped',
inplace=True,
)
ambrosia LOGGER: After transformation СUPED for target, the variance is 67.0818 % of the original
ambrosia LOGGER: Variance transformation 2982.4627 ===> 2000.6892
[7]:
| feature_1 | feature_2 | feature_3 | target | target_cuped | |
|---|---|---|---|---|---|
| 0 | -2.426916 | 5.575498 | 43.505323 | 187.385459 | 204.513107 |
| 1 | -2.745189 | 7.995822 | 19.942889 | 99.691566 | 109.350175 |
| 2 | 2.437555 | 17.254237 | 33.091612 | 188.880782 | 169.968233 |
| 3 | 6.202871 | 28.913551 | 25.026746 | 199.532560 | 144.639755 |
| 4 | 3.099725 | 3.771417 | 26.403917 | 121.956238 | 144.651222 |
| ... | ... | ... | ... | ... | ... |
| 2995 | 1.277060 | 22.630330 | 36.479685 | 216.416345 | 180.913351 |
| 2996 | 5.124652 | 58.120888 | 13.836445 | 239.307014 | 94.281340 |
| 2997 | -0.654616 | 3.930848 | 32.036205 | 139.957720 | 162.160705 |
| 2998 | 0.401016 | 29.254561 | 38.268808 | 240.608496 | 184.663346 |
| 2999 | 0.488993 | 4.792474 | 26.568754 | 121.064233 | 140.608270 |
3000 rows × 5 columns
Store fitted params
[8]:
store_path_cuped = '_examples_configs/cuped_config.json'
[9]:
cuped.get_params_dict()
[9]:
{'target_column': 'target',
'transformed_name': 'target_cuped',
'covariate_column': 'feature_2',
'theta': 3.085966714908545,
'bias': 11.125671107545354}
[10]:
cuped.store_params(store_path_cuped)
Load params
[11]:
new_cuped = Cuped()
new_cuped.load_params(store_path_cuped)
[12]:
new_cuped.get_params_dict()
[12]:
{'target_column': 'target',
'transformed_name': 'target_cuped',
'covariate_column': 'feature_2',
'theta': 3.085966714908545,
'bias': 11.125671107545354}
MultiCuped¶
[13]:
multicuped = MultiCuped()
Fit and transform
[14]:
multicuped.fit_transform(dataframe=data,
target_column=target_column,
covariate_columns=['feature_2', 'feature_3'],
transformed_name='target_multicuped',
inplace=True)
ambrosia LOGGER: After transformation Multi СUPED for target, the variance is 1.2779 % of the original
ambrosia LOGGER: Variance transformation 2982.4627 ===> 38.1133
[14]:
| feature_1 | feature_2 | feature_3 | target | target_cuped | target_multicuped | |
|---|---|---|---|---|---|---|
| 0 | -2.426916 | 5.575498 | 43.505323 | 187.385459 | 204.513107 | 141.715314 |
| 1 | -2.745189 | 7.995822 | 19.942889 | 99.691566 | 109.350175 | 140.948473 |
| 2 | 2.437555 | 17.254237 | 33.091612 | 188.880782 | 169.968233 | 149.436534 |
| 3 | 6.202871 | 28.913551 | 25.026746 | 199.532560 | 144.639755 | 156.975607 |
| 4 | 3.099725 | 3.771417 | 26.403917 | 121.956238 | 144.651222 | 150.181834 |
| ... | ... | ... | ... | ... | ... | ... |
| 2995 | 1.277060 | 22.630330 | 36.479685 | 216.416345 | 180.913351 | 147.103213 |
| 2996 | 5.124652 | 58.120888 | 13.836445 | 239.307014 | 94.281340 | 152.893408 |
| 2997 | -0.654616 | 3.930848 | 32.036205 | 139.957720 | 162.160705 | 145.165200 |
| 2998 | 0.401016 | 29.254561 | 38.268808 | 240.608496 | 184.663346 | 144.036343 |
| 2999 | 0.488993 | 4.792474 | 26.568754 | 121.064233 | 140.608270 | 145.531984 |
3000 rows × 6 columns
Store fitted params
[15]:
store_path_multicuped = '_examples_configs/multicuped_config.json'
[16]:
multicuped.get_params_dict()
[16]:
{'target_column': 'target',
'transformed_name': 'target_multicuped',
'covariate_columns': ['feature_2', 'feature_3'],
'theta': [[3.034447972098987], [4.000919354909565]],
'bias': 145.30970530527566}
[17]:
multicuped.store_params(store_path_multicuped)
Load params
[18]:
new_multicuped = MultiCuped()
new_multicuped.load_params(store_path_multicuped)
[19]:
new_multicuped.get_params_dict()
[19]:
{'target_column': 'target',
'transformed_name': 'target_multicuped',
'covariate_columns': ['feature_2', 'feature_3'],
'theta': [[3.034447972098987], [4.000919354909565]],
'bias': 145.30970530527566}
ML Variance Reduction¶
[20]:
mltransformer = MLVarianceReducer()
Fit and transform
[21]:
mltransformer.fit_transform(dataframe=data,
target_column=target_column,
covariate_columns=['feature_2', 'feature_3'],
transformed_name='target_mlreducer',
inplace=True)
ambrosia LOGGER: After transformation ML approach reduce for target, the variance is 0.9774 % of the original
ambrosia LOGGER: Variance transformation 2982.4627 ===> 29.1504
ambrosia LOGGER: Prediction MSE score - 2945.29041
[21]:
| feature_1 | feature_2 | feature_3 | target | target_cuped | target_multicuped | target_mlreducer | |
|---|---|---|---|---|---|---|---|
| 0 | -2.426916 | 5.575498 | 43.505323 | 187.385459 | 204.513107 | 141.715314 | 144.540545 |
| 1 | -2.745189 | 7.995822 | 19.942889 | 99.691566 | 109.350175 | 140.948473 | 141.665906 |
| 2 | 2.437555 | 17.254237 | 33.091612 | 188.880782 | 169.968233 | 149.436534 | 149.703421 |
| 3 | 6.202871 | 28.913551 | 25.026746 | 199.532560 | 144.639755 | 156.975607 | 153.785873 |
| 4 | 3.099725 | 3.771417 | 26.403917 | 121.956238 | 144.651222 | 150.181834 | 150.223084 |
| ... | ... | ... | ... | ... | ... | ... | ... |
| 2995 | 1.277060 | 22.630330 | 36.479685 | 216.416345 | 180.913351 | 147.103213 | 148.719639 |
| 2996 | 5.124652 | 58.120888 | 13.836445 | 239.307014 | 94.281340 | 152.893408 | 151.970972 |
| 2997 | -0.654616 | 3.930848 | 32.036205 | 139.957720 | 162.160705 | 145.165200 | 146.166828 |
| 2998 | 0.401016 | 29.254561 | 38.268808 | 240.608496 | 184.663346 | 144.036343 | 141.938850 |
| 2999 | 0.488993 | 4.792474 | 26.568754 | 121.064233 | 140.608270 | 145.531984 | 144.528026 |
3000 rows × 7 columns
Note: Be careful about overfitting and failing method(s) prerequisites
Final variance of the target metric¶
[22]:
data[['target', 'target_cuped', 'target_multicuped', 'target_mlreducer']].var()
[22]:
target 2983.457229
target_cuped 2001.356367
target_multicuped 38.126050
target_mlreducer 29.160151
dtype: float64
We can observe different variance reduction of the target metric
Learn more¶
To get the information on advanced metric transformation techniques see the following resources: