Experiment Design

Ambrosia offers tools for calculating A/B test parameters such as effect uplift, groups size, and experiment statistical power, based on historical metrics values.

Choice of design approach

The theoretical approach to designing experimental parameters is much faster than the empirical one.

Designer

Unit for experiments and pilots parameters design.

load_from_config

Restore a Designer class instance from a yaml config.

design

Function wrapper around the Designer class.

design_binary

Design of experiment parameters for binary metrics based on a known conversion value.


class ambrosia.designer.Designer(dataframe=None, sizes=None, effects=None, first_type_errors=0.05, second_type_errors=0.2, metrics=None, method='theory')[source]

Unit for experiments and pilots parameters design.

Enables to design missing experiment parameters using historical data. The main related to each other designable parameters for a single metric are:

  • Effect (Minimal Detectible Effect):

    old_mean_metric_value * effect_value = new_mean_metric_value

  • Sample size:

    Number of research objects in sample (for example number of users and their retention).

  • Errors (I type error, II type error):
    I error (alpha):

    Probability to detect presence of effect for equally distributed samples.

    II error (beta):

    Probability not to find effect for differently distributed samples.

Parameters:
dataframePassedDataType, optional

DataFrame with metrics historical values.

sizesSampleSizeType, optional

Values of research objects number in groups samples during the experiment.

effectsEffectType, optional

Effects values that are expected during the experiment.

first_type_errorsStatErrorType, default: 0.05

I type error bounds P (detect difference for equal) < alpha.

second_type_errorsStatErrorType, default: 0.2

II type error bounds P (suppose equality for different groups) < beta.

metricsMetricNamesType, optional

Column names of metrics in dataframe to be designed.

methodstr, optional

Method used for experiment design. Can be "theory", "empiric" or "binary".

Attributes:
dataframePassedDataType

DataFrame with metrics historocal values.

sizesSampleSizeType

Number of research objects in group samples.

effectsEffectType

Effects values in the experiment.

first_type_errorsStatErrorType, default: 0.05

I type errors.

second_type_errorsStatErrorType, default: 0.2

II type errors.

metricsMetricNamesType

Column names of metrics in dataframe to be designed.

methodstr

Method used for experiment design.

Notes

Constructors:

>>> designer = Designer()
>>> # You can pass an Iterable or single object for some parameters
>>> designer = Designer(
>>>     dataframe=df,
>>>     sizes=[100, 200],
>>>     metrics='LTV',
>>>     effects=1.05
>>> )
>>> designer = Desginer(sizes=1000, metrics=['retention', 'LTV'])
>>> # You can use path to .csv table for pandas
>>> designer = Designer('./data/table.csv')

Setters:

>>> designer.set_first_errors([0.05, 0.01])
>>> desginer.set_dataframe(df)

Run:

>>> # One can pass arguments and they will have higher priority
>>> designer.run('size', effects=1.1)
>>> designer.run('effect', sizes=[500, 1000], metrics='retention')
>>> # You can set method (watch below)
>>> designer.run('effect', sizes=[500, 1000], metrics='retention', method='binary')

Load from yaml config:

>>> config = '''
        !splitter # <--- this is yaml tag (!important)
            effects:
                - 0.9
                - 1.05
            sizes:
                - 1000
    '''
>>> designer = yaml.load(config)
>>> # Or use the implmented function
>>> designer = load_from_config(config)

Use standalone function instead of a class:

>>> design('size', dataframe=df, effects=1.05, metrics='retention')

Examples

We have retention labels for users of mobile app for previous month. Suppose old_retention = 0.3, that is 30% of users returned to the app in a month after installation.

Let us fix the following parameters:

I type error (alpha) = 0.05 (5% of equal samples we can suppose to be different).

II type error (beta) = 0.2 (20% of different sampels we can suppose to be equal).

We add onboarding to our app and want to estimate an effect, by A/B testing and wish to increase retention value to 31% percents, so our effect parameter gets value of 1.0(3). Now we want to find how much users we need in both groups to detect such effect.

We can use Designer class in the following way:

>>> designer = Designer(dataframe=df, metric='retention', effect=1.033)
>>> designer.run("size")
Note, that default values for errors are:

first_type_error = 0.05

second_type_error = 0.2

Then we get dataframe that contains value of sufficient number of users for our experiment.

run(to_design, method=None, sizes=None, effects=None, first_type_errors=None, second_type_errors=None, dataframe=None, metrics=None, **kwargs)[source]

Perform an experiment design for chosen parameter and metrics using historical data.

Parameters:
to_designstr

Parameter that will be designed using historical data. Can take the values of "size", "effect" or "power".

methodstr, optional

Method used for experiment design. Can be "theory", "empiric" or "binary".

sizesSampleSizeType, optional

Values of research objects number in groups samples during the experiment. If is not provided, must exist as proper class attribute.

effectsEffectType, optional

Effects for experiment If is not provided, must exist as proper class attribute.

first_type_errorsStatErrorType, optional

I type error bounds P (detect difference for equal) < alpha.

second_type_errorsStatErrorType, optional

II type error bounds P (suppose equality for different groups) < beta.

dataframePassedDataType, optional

DataFrame with metrics historical values. If is not provided, must exist as proper class attribute.

metricsMetricNamesType, optional

Column names of metrics in dataframe to be designed. If not provided, must exist as proper class attribute.

**kwargsDict

Other keyword arguments.

Returns:
resultDesignerResult

Table or dictionary with the results of parameter design for each metric.

Other Parameters:
as_numericbool, default: False

The result of calculations can be obtained as a percentage string either as a number, this parameter could used to toggle.

groups_ratiofloat, default: 1.0

Ratio between two groups.

alternativestr, default: "two-sided"

Alternative hypothesis, can be "two-sided", "greater" or "less". "greater" - if effect is positive. "less" - if effect is negative.

stabilizing_methodstr, default: "asin"

Effect trasformation. Can be "asin" and "norm". For non-binary metrics: only "norm" is accceptable. For binary metrics: "norm" and "asin", but "asin" is more robust and accurate. Acceptable only for "theory" method and actual for binary metrics!

ambrosia.designer.load_from_config(yaml_config, loader=<class 'yaml.loader.Loader'>)[source]

Restore a Designer class instance from a yaml config.

For yaml_config you can pass file name with config, it must ends with .yaml, for example: “config.yaml”.

For loader you can choose SafeLoader.

ambrosia.designer.design(to_design, dataframe, metrics, sizes=None, effects=None, first_type_errors=(0.05,), second_type_errors=(0.2,), method='theory', **kwargs)[source]

Function wrapper around the Designer class.

Make experiment design based on historical data using passed arguments.

Creates an instance of the Designer class internally and execute run method with corresponding arguments.

Parameters:
to_designstr

Parameter that will be designed using historical data. Can take the values of "size", "effect" or "power".

dataframePassedDataType

DataFrame with metrics historical values.

metricsMetricNamesType

Column names of metrics in dataframe to be designed.

sizesSampleSizeType, optional

Values of research objects number in groups samples during the experiment. If is not provided, effects value must be defined.

effectsEffectType, optional

Effects for experiment If is not provided, sizes value must be defined.

first_type_errorsStatErrorType, default: (0.05,)

I type error bounds P (detect difference for equal) < alpha.

second_type_errorsStatErrorType, default: (0.2,)

II type error bounds P (suppose equality for different groups) < beta.

methodstr, default: "theory"

Method used for experiment design. Can be "theory", "empiric" or "binary".

**kwargsDict

Other keyword arguments.

Returns:
resultDesignerResult

Table or dictionary with the results of parameter design for each metric.

Other Parameters:
as_numericbool, default: False

The result of calculations can be obtained as a percentage string either as a number, this parameter could used to toggle.

groups_ratiofloat, default: 1.0

Ratio between two groups.

alternativestr, default: "two-sided"

Alternative hypothesis, can be "two-sided", "greater" or "less". "greater" - if effect is positive. "less" - if effect is negative.

stabilizing_methodstr, default: "asin"

Effect trasformation. Can be "asin" and "norm". For non-binary metrics: only "norm" is accceptable. For binary metrics: "norm" and "asin", but "asin" is more robust and accurate. Acceptable only for "theory" method and actual for binary metrics!

ambrosia.designer.design_binary(to_design, prob_a, sizes=None, effects=None, first_type_errors=(0.05,), second_type_errors=(0.2,), method='theory', groups_ratio=1.0, alternative='two-sided', stabilizing_method='asin', **kwargs)[source]

Design of experiment parameters for binary metrics based on a known conversion value.

Parameters:
to_designstr

Parameter to design.

prob_afloat

Probability of success for the control group.

sizesSampleSizeType, optional

List or single value of group sizes. For example: 100, [100, 200].

effectsEffectType, optional

List of single value of relative effects. For example: 1.05, [1.05, 1.2].

first_type_errorsStatErrorType, default: (0.05, )

I type error bounds P (detect difference for equal) < alpha.

second_type_errorsStatErrorType, default: (0.2,)

II type error bounds P (suppose equality for different groups) < beta.

method: str, default: ``”theory”``

Supports 2 methods: "theory" and "binary" "theory" ~ by formula using statsmodels solve_power mechanism "binary" ~ using different types of intervals

groups_ratiofloat, default: 1.0

Ratio between two groups.

alternativestr, default: "two-sided"

Alternative hypothesis, can be "two-sided", "greater" or "less". "greater" - if effect is positive. "less" - if effect is negative.

stabilizing_methodstr, default: "asin"

Effect trasformation. Can be "asin" and "norm". For non-binary metrics: only "norm" is accceptable. For binary metrics: "norm" and "asin", but "asin" is more robust and accurate.

**kwargsDict

Other keyword arguments.

Returns:
result_tablepd.DataFrame

Table with results of design.

Examples of using experiment design tools