Experiment Design¶
Ambrosia offers tools for calculating A/B test parameters such as effect uplift, groups size, and experiment statistical power, based on historical metrics values.
Choice of design approach
The theoretical approach to designing experimental parameters is much faster than the empirical one.
Unit for experiments and pilots parameters design. |
|
Restore a |
|
Function wrapper around the |
|
Design of experiment parameters for binary metrics based on a known conversion value. |
- class ambrosia.designer.Designer(dataframe=None, sizes=None, effects=None, first_type_errors=0.05, second_type_errors=0.2, metrics=None, method='theory')[source]¶
Unit for experiments and pilots parameters design.
Enables to design missing experiment parameters using historical data. The main related to each other designable parameters for a single metric are:
- Effect (Minimal Detectible Effect):
old_mean_metric_value * effect_value = new_mean_metric_value
- Sample size:
Number of research objects in sample (for example number of users and their retention).
- Errors (I type error, II type error):
- I error (alpha):
Probability to detect presence of effect for equally distributed samples.
- II error (beta):
Probability not to find effect for differently distributed samples.
- Parameters:
- dataframePassedDataType, optional
DataFrame with metrics historical values.
- sizesSampleSizeType, optional
Values of research objects number in groups samples during the experiment.
- effectsEffectType, optional
Effects values that are expected during the experiment.
- first_type_errorsStatErrorType, default:
0.05 I type error bounds P (detect difference for equal) < alpha.
- second_type_errorsStatErrorType, default:
0.2 II type error bounds P (suppose equality for different groups) < beta.
- metricsMetricNamesType, optional
Column names of metrics in dataframe to be designed.
- methodstr, optional
Method used for experiment design. Can be
"theory","empiric"or"binary".
- Attributes:
- dataframePassedDataType
DataFrame with metrics historocal values.
- sizesSampleSizeType
Number of research objects in group samples.
- effectsEffectType
Effects values in the experiment.
- first_type_errorsStatErrorType, default:
0.05 I type errors.
- second_type_errorsStatErrorType, default:
0.2 II type errors.
- metricsMetricNamesType
Column names of metrics in dataframe to be designed.
- methodstr
Method used for experiment design.
Notes
Constructors:
>>> designer = Designer() >>> # You can pass an Iterable or single object for some parameters >>> designer = Designer( >>> dataframe=df, >>> sizes=[100, 200], >>> metrics='LTV', >>> effects=1.05 >>> ) >>> designer = Desginer(sizes=1000, metrics=['retention', 'LTV']) >>> # You can use path to .csv table for pandas >>> designer = Designer('./data/table.csv')
Setters:
>>> designer.set_first_errors([0.05, 0.01]) >>> desginer.set_dataframe(df)
Run:
>>> # One can pass arguments and they will have higher priority >>> designer.run('size', effects=1.1) >>> designer.run('effect', sizes=[500, 1000], metrics='retention') >>> # You can set method (watch below) >>> designer.run('effect', sizes=[500, 1000], metrics='retention', method='binary')
Load from yaml config:
>>> config = ''' !splitter # <--- this is yaml tag (!important) effects: - 0.9 - 1.05 sizes: - 1000 ''' >>> designer = yaml.load(config) >>> # Or use the implmented function >>> designer = load_from_config(config)
Use standalone function instead of a class:
>>> design('size', dataframe=df, effects=1.05, metrics='retention')
Examples
We have retention labels for users of mobile app for previous month. Suppose old_retention =
0.3, that is 30% of users returned to the app in a month after installation.- Let us fix the following parameters:
I type error (alpha) =
0.05(5% of equal samples we can suppose to be different).II type error (beta) =
0.2(20% of different sampels we can suppose to be equal).
We add onboarding to our app and want to estimate an effect, by A/B testing and wish to increase retention value to 31% percents, so our effect parameter gets value of
1.0(3). Now we want to find how much users we need in both groups to detect such effect.We can use
Designerclass in the following way:>>> designer = Designer(dataframe=df, metric='retention', effect=1.033) >>> designer.run("size")
- Note, that default values for errors are:
first_type_error=0.05second_type_error=0.2
Then we get dataframe that contains value of sufficient number of users for our experiment.
- run(to_design, method=None, sizes=None, effects=None, first_type_errors=None, second_type_errors=None, dataframe=None, metrics=None, **kwargs)[source]¶
Perform an experiment design for chosen parameter and metrics using historical data.
- Parameters:
- to_designstr
Parameter that will be designed using historical data. Can take the values of
"size","effect"or"power".- methodstr, optional
Method used for experiment design. Can be
"theory","empiric"or"binary".- sizesSampleSizeType, optional
Values of research objects number in groups samples during the experiment. If is not provided, must exist as proper class attribute.
- effectsEffectType, optional
Effects for experiment If is not provided, must exist as proper class attribute.
- first_type_errorsStatErrorType, optional
I type error bounds P (detect difference for equal) < alpha.
- second_type_errorsStatErrorType, optional
II type error bounds P (suppose equality for different groups) < beta.
- dataframePassedDataType, optional
DataFrame with metrics historical values. If is not provided, must exist as proper class attribute.
- metricsMetricNamesType, optional
Column names of metrics in dataframe to be designed. If not provided, must exist as proper class attribute.
- **kwargsDict
Other keyword arguments.
- Returns:
- resultDesignerResult
Table or dictionary with the results of parameter design for each metric.
- Other Parameters:
- as_numericbool, default:
False The result of calculations can be obtained as a percentage string either as a number, this parameter could used to toggle.
- groups_ratiofloat, default:
1.0 Ratio between two groups.
- alternativestr, default:
"two-sided" Alternative hypothesis, can be
"two-sided","greater"or"less"."greater"- if effect is positive."less"- if effect is negative.- stabilizing_methodstr, default:
"asin" Effect trasformation. Can be
"asin"and"norm". For non-binary metrics: only"norm"is accceptable. For binary metrics:"norm"and"asin", but"asin"is more robust and accurate. Acceptable only for"theory"method and actual for binary metrics!
- as_numericbool, default:
- ambrosia.designer.load_from_config(yaml_config, loader=<class 'yaml.loader.Loader'>)[source]¶
Restore a
Designerclass instance from a yaml config.For yaml_config you can pass file name with config, it must ends with .yaml, for example: “config.yaml”.
For loader you can choose SafeLoader.
- ambrosia.designer.design(to_design, dataframe, metrics, sizes=None, effects=None, first_type_errors=(0.05,), second_type_errors=(0.2,), method='theory', **kwargs)[source]¶
Function wrapper around the
Designerclass.Make experiment design based on historical data using passed arguments.
Creates an instance of the
Designerclass internally and execute run method with corresponding arguments.- Parameters:
- to_designstr
Parameter that will be designed using historical data. Can take the values of
"size","effect"or"power".- dataframePassedDataType
DataFrame with metrics historical values.
- metricsMetricNamesType
Column names of metrics in dataframe to be designed.
- sizesSampleSizeType, optional
Values of research objects number in groups samples during the experiment. If is not provided,
effectsvalue must be defined.- effectsEffectType, optional
Effects for experiment If is not provided,
sizesvalue must be defined.- first_type_errorsStatErrorType, default:
(0.05,) I type error bounds P (detect difference for equal) < alpha.
- second_type_errorsStatErrorType, default:
(0.2,) II type error bounds P (suppose equality for different groups) < beta.
- methodstr, default:
"theory" Method used for experiment design. Can be
"theory","empiric"or"binary".- **kwargsDict
Other keyword arguments.
- Returns:
- resultDesignerResult
Table or dictionary with the results of parameter design for each metric.
- Other Parameters:
- as_numericbool, default:
False The result of calculations can be obtained as a percentage string either as a number, this parameter could used to toggle.
- groups_ratiofloat, default:
1.0 Ratio between two groups.
- alternativestr, default:
"two-sided" Alternative hypothesis, can be
"two-sided","greater"or"less"."greater"- if effect is positive."less"- if effect is negative.- stabilizing_methodstr, default:
"asin" Effect trasformation. Can be
"asin"and"norm". For non-binary metrics: only"norm"is accceptable. For binary metrics:"norm"and"asin", but"asin"is more robust and accurate. Acceptable only for"theory"method and actual for binary metrics!
- as_numericbool, default:
- ambrosia.designer.design_binary(to_design, prob_a, sizes=None, effects=None, first_type_errors=(0.05,), second_type_errors=(0.2,), method='theory', groups_ratio=1.0, alternative='two-sided', stabilizing_method='asin', **kwargs)[source]¶
Design of experiment parameters for binary metrics based on a known conversion value.
- Parameters:
- to_designstr
Parameter to design.
- prob_afloat
Probability of success for the control group.
- sizesSampleSizeType, optional
List or single value of group sizes. For example:
100,[100, 200].- effectsEffectType, optional
List of single value of relative effects. For example: 1.05, [1.05, 1.2].
- first_type_errorsStatErrorType, default:
(0.05, ) I type error bounds P (detect difference for equal) < alpha.
- second_type_errorsStatErrorType, default:
(0.2,) II type error bounds P (suppose equality for different groups) < beta.
- method: str, default: ``”theory”``
Supports 2 methods:
"theory"and"binary""theory"~ by formula using statsmodels solve_power mechanism"binary"~ using different types of intervals- groups_ratiofloat, default:
1.0 Ratio between two groups.
- alternativestr, default:
"two-sided" Alternative hypothesis, can be
"two-sided","greater"or"less"."greater"- if effect is positive."less"- if effect is negative.- stabilizing_methodstr, default:
"asin" Effect trasformation. Can be
"asin"and"norm". For non-binary metrics: only"norm"is accceptable. For binary metrics:"norm"and"asin", but"asin"is more robust and accurate.- **kwargsDict
Other keyword arguments.
- Returns:
- result_tablepd.DataFrame
Table with results of design.