Experiment Design¶

Ambrosia offers tools for calculating A/B test parameters such as effect uplift, groups size, and experiment statistical power, based on historical metrics values.

Choice of design approach

The theoretical approach to designing experimental parameters is much faster than the empirical one.

`Designer`	Unit for experiments and pilots parameters design.
`load_from_config`	Restore a `Designer` class instance from a yaml config.
`design`	Function wrapper around the `Designer` class.
`design_binary`	Design of experiment parameters for binary metrics based on a known conversion value.

class ambrosia.designer.Designer(dataframe=None, sizes=None, effects=None, first_type_errors=0.05, second_type_errors=0.2, metrics=None, method='theory')[source]¶

Unit for experiments and pilots parameters design.

Enables to design missing experiment parameters using historical data. The main related to each other designable parameters for a single metric are:

Effect (Minimal Detectible Effect):
old_mean_metric_value * effect_value = new_mean_metric_value

Sample size:
Number of research objects in sample (for example number of users and their retention).

Errors (I type error, II type error):

I error (alpha):
Probability to detect presence of effect for equally distributed samples.

II error (beta):
Probability not to find effect for differently distributed samples.

Parameters:

dataframePassedDataType, optional: DataFrame with metrics historical values.
sizesSampleSizeType, optional: Values of research objects number in groups samples during the experiment.
effectsEffectType, optional: Effects values that are expected during the experiment.
first_type_errorsStatErrorType, default: 0.05: I type error bounds P (detect difference for equal) < alpha.
second_type_errorsStatErrorType, default: 0.2: II type error bounds P (suppose equality for different groups) < beta.
metricsMetricNamesType, optional: Column names of metrics in dataframe to be designed.
methodstr, optional: Method used for experiment design. Can be "theory", "empiric" or "binary".

Attributes:

dataframePassedDataType: DataFrame with metrics historocal values.
sizesSampleSizeType: Number of research objects in group samples.
effectsEffectType: Effects values in the experiment.
first_type_errorsStatErrorType, default: 0.05: I type errors.
second_type_errorsStatErrorType, default: 0.2: II type errors.
metricsMetricNamesType: Column names of metrics in dataframe to be designed.
methodstr: Method used for experiment design.

Notes

Constructors:

>>> designer = Designer()
>>> # You can pass an Iterable or single object for some parameters
>>> designer = Designer(
>>>     dataframe=df,
>>>     sizes=[100, 200],
>>>     metrics='LTV',
>>>     effects=1.05
>>> )
>>> designer = Desginer(sizes=1000, metrics=['retention', 'LTV'])
>>> # You can use path to .csv table for pandas
>>> designer = Designer('./data/table.csv')

Setters:

>>> designer.set_first_errors([0.05, 0.01])
>>> desginer.set_dataframe(df)

Run:

>>> # One can pass arguments and they will have higher priority
>>> designer.run('size', effects=1.1)
>>> designer.run('effect', sizes=[500, 1000], metrics='retention')
>>> # You can set method (watch below)
>>> designer.run('effect', sizes=[500, 1000], metrics='retention', method='binary')

Load from yaml config:

>>> config = '''
        !splitter # <--- this is yaml tag (!important)
            effects:
                - 0.9
                - 1.05
            sizes:
                - 1000
    '''
>>> designer = yaml.load(config)
>>> # Or use the implmented function
>>> designer = load_from_config(config)

Use standalone function instead of a class:

>>> design('size', dataframe=df, effects=1.05, metrics='retention')

Examples

We have retention labels for users of mobile app for previous month. Suppose old_retention = 0.3, that is 30% of users returned to the app in a month after installation.

Let us fix the following parameters:

I type error (alpha) = 0.05 (5% of equal samples we can suppose to be different).

II type error (beta) = 0.2 (20% of different sampels we can suppose to be equal).

We add onboarding to our app and want to estimate an effect, by A/B testing and wish to increase retention value to 31% percents, so our effect parameter gets value of 1.0(3). Now we want to find how much users we need in both groups to detect such effect.

We can use Designer class in the following way:

>>> designer = Designer(dataframe=df, metric='retention', effect=1.033)
>>> designer.run("size")

Note, that default values for errors are:

first_type_error = 0.05

second_type_error = 0.2

Then we get dataframe that contains value of sufficient number of users for our experiment.

run(to_design, method=None, sizes=None, effects=None, first_type_errors=None, second_type_errors=None, dataframe=None, metrics=None, **kwargs)[source]¶

Perform an experiment design for chosen parameter and metrics using historical data.

Parameters:

to_designstr: Parameter that will be designed using historical data. Can take the values of "size", "effect" or "power".
methodstr, optional: Method used for experiment design. Can be "theory", "empiric" or "binary".
sizesSampleSizeType, optional: Values of research objects number in groups samples during the experiment. If is not provided, must exist as proper class attribute.
effectsEffectType, optional: Effects for experiment If is not provided, must exist as proper class attribute.
first_type_errorsStatErrorType, optional: I type error bounds P (detect difference for equal) < alpha.
second_type_errorsStatErrorType, optional: II type error bounds P (suppose equality for different groups) < beta.
dataframePassedDataType, optional: DataFrame with metrics historical values. If is not provided, must exist as proper class attribute.
metricsMetricNamesType, optional: Column names of metrics in dataframe to be designed. If not provided, must exist as proper class attribute.
**kwargsDict: Other keyword arguments.

Returns:

resultDesignerResult: Table or dictionary with the results of parameter design for each metric.

Other Parameters:

as_numericbool, default: False: The result of calculations can be obtained as a percentage string either as a number, this parameter could used to toggle.
groups_ratiofloat, default: 1.0: Ratio between two groups.
alternativestr, default: "two-sided": Alternative hypothesis, can be "two-sided", "greater" or "less". "greater" - if effect is positive. "less" - if effect is negative.
stabilizing_methodstr, default: "asin": Effect trasformation. Can be "asin" and "norm". For non-binary metrics: only "norm" is accceptable. For binary metrics: "norm" and "asin", but "asin" is more robust and accurate. Acceptable only for "theory" method and actual for binary metrics!

ambrosia.designer.load_from_config(yaml_config, loader=<class 'yaml.loader.Loader'>)[source]¶

Restore a Designer class instance from a yaml config.

For yaml_config you can pass file name with config, it must ends with .yaml, for example: “config.yaml”.

For loader you can choose SafeLoader.

ambrosia.designer.design(to_design, dataframe, metrics, sizes=None, effects=None, first_type_errors=(0.05,), second_type_errors=(0.2,), method='theory', **kwargs)[source]¶

Function wrapper around the Designer class.

Make experiment design based on historical data using passed arguments.

Creates an instance of the Designer class internally and execute run method with corresponding arguments.

Parameters:

to_designstr: Parameter that will be designed using historical data. Can take the values of "size", "effect" or "power".
dataframePassedDataType: DataFrame with metrics historical values.
metricsMetricNamesType: Column names of metrics in dataframe to be designed.
sizesSampleSizeType, optional: Values of research objects number in groups samples during the experiment. If is not provided, effects value must be defined.
effectsEffectType, optional: Effects for experiment If is not provided, sizes value must be defined.
first_type_errorsStatErrorType, default: (0.05,): I type error bounds P (detect difference for equal) < alpha.
second_type_errorsStatErrorType, default: (0.2,): II type error bounds P (suppose equality for different groups) < beta.
methodstr, default: "theory": Method used for experiment design. Can be "theory", "empiric" or "binary".
**kwargsDict: Other keyword arguments.

Returns:

resultDesignerResult: Table or dictionary with the results of parameter design for each metric.

Other Parameters:

as_numericbool, default: False: The result of calculations can be obtained as a percentage string either as a number, this parameter could used to toggle.
groups_ratiofloat, default: 1.0: Ratio between two groups.
alternativestr, default: "two-sided": Alternative hypothesis, can be "two-sided", "greater" or "less". "greater" - if effect is positive. "less" - if effect is negative.
stabilizing_methodstr, default: "asin": Effect trasformation. Can be "asin" and "norm". For non-binary metrics: only "norm" is accceptable. For binary metrics: "norm" and "asin", but "asin" is more robust and accurate. Acceptable only for "theory" method and actual for binary metrics!

ambrosia.designer.design_binary(to_design, prob_a, sizes=None, effects=None, first_type_errors=(0.05,), second_type_errors=(0.2,), method='theory', groups_ratio=1.0, alternative='two-sided', stabilizing_method='asin', **kwargs)[source]¶

Design of experiment parameters for binary metrics based on a known conversion value.

Parameters:

to_designstr: Parameter to design.
prob_afloat: Probability of success for the control group.
sizesSampleSizeType, optional: List or single value of group sizes. For example: 100, [100, 200].
effectsEffectType, optional: List of single value of relative effects. For example: 1.05, [1.05, 1.2].
first_type_errorsStatErrorType, default: (0.05, ): I type error bounds P (detect difference for equal) < alpha.
second_type_errorsStatErrorType, default: (0.2,): II type error bounds P (suppose equality for different groups) < beta.
method: str, default: ``”theory”``: Supports 2 methods: "theory" and "binary" "theory" ~ by formula using statsmodels solve_power mechanism "binary" ~ using different types of intervals
groups_ratiofloat, default: 1.0: Ratio between two groups.
alternativestr, default: "two-sided": Alternative hypothesis, can be "two-sided", "greater" or "less". "greater" - if effect is positive. "less" - if effect is negative.
stabilizing_methodstr, default: "asin": Effect trasformation. Can be "asin" and "norm". For non-binary metrics: only "norm" is accceptable. For binary metrics: "norm" and "asin", but "asin" is more robust and accurate.
**kwargsDict: Other keyword arguments.

Returns:

result_tablepd.DataFrame: Table with results of design.

Experiment Design¶

Examples of using experiment design tools¶