Binary metric A/B test design example using Ambrosia¶

This example is about how Ambrosia can be used to calculate the parameters of an experiment with binary metrics. For a binary metric, there are some differences in the calculations regarding continuous metrics.

Let’s consider an example of calculating the parameters of a hypothetical experiment based on synthetic data on user retention rate.

[2]:

import numpy as np
import pandas as pd

import matplotlib.pyplot as plt
import seaborn as sns

from ambrosia.designer import Designer, design_binary

Load data

[3]:

data = pd.read_csv('../tests/test_data/ltv_retention.csv')

[4]:

data.head()

[4]:

	LTV	retention
0	38.004891	0.0
1	70.588069	1.0
2	13.585602	1.0
3	19.813550	0.0
4	207.213003	0.0

Experiment design based on available historical data¶

In many situations, we have historical retention/conversion rate data available, and this data can be used in the same way as continuous data.

In order to calculate some A/B test parameters of interest, such as experimental power, group size, or minimum detectable effect, we need to pass them in the same way to the Designer class.

[5]:

designer = Designer(dataframe=data, metrics='retention')

For binary data, we can use either the "theory" method or the "binary" method.

The "theory" method performs a numerical calculation of the parameters using various approximations.

The approximation method choice is controlled by the stabilizing_method parameter and defaults to "asin" which is more accurate and robust. You can find more information about the approximations on the Net.

The "binary" approach does parameter estimation based on the multiple construction of chosen confidence interval. Some of these intervals are quite exotic and should be studied for conscious application. As a default a standard Wald interval is used.

This approach is more slowly and it’s accuracy depends on the number of iterations.

Now let’s create a grid of known parameters and calculate interested ones. We will use two above methods consequntly.

[6]:

# Create grid of MDEs and group sizes
# I and II type errros will have default values
effects = [1.02, 1.05, 1.1]
group_sizes = [500, 1000, 2000]

First design group sizes

[7]:

designer.run(to_design='size', method='theory', effects=effects)

[7]:

Errors ($\alpha$, $\beta$)	(0.05; 0.2)
Effect
2.0%	58885
5.0%	9464
10.0%	2382

Then design MDE values

[8]:

designer.run(to_design='effect', method='theory', sizes=group_sizes)

[8]:

Errors ($\alpha$, $\beta$)	(0.05; 0.2)
Group sizes
500	21.9%
1000	15.5%
2000	10.9%

Finally design power

[9]:

designer.run(to_design='power',
             method='theory',
             effects=effects,
             sizes=group_sizes)

[9]:

	Group sizes	500	1000	2000
$\alpha$	Effect
0.05	2.0%	5.8%	6.5%	8.1%
	5.0%	9.9%	14.9%	25.1%
	10.0%	25.0%	44.3%	72.8%

Now let’s design groups size for 10% MDE value again, using "binary" approach and compare to "theory" method result.

We will increase number of constructed intervals using amount parameter to check how the accuracy of estimated group size value is increased

[10]:

interval_amounts = [2000, 5000, 20000]
group_size_estimation_dict = {}

[11]:

for amount in interval_amounts:
    group_size_estimation_dict[amount] = []
    for step in range(200):
        estimated_size = designer.run(to_design='size',
                                      method='binary',
                                      interval_type='wald',
                                      amount=amount,
                                      effects=1.1).values[0][0]
        group_size_estimation_dict[amount].append(estimated_size)

Draw the results

[12]:

plt.figure(figsize=(8, 6))
plt.title('Group size estimation for 10% MDE')
for key in group_size_estimation_dict:
    label = f'amount number={key}'
    sns.histplot(group_size_estimation_dict[key], label=label)
plt.plot(2382, 0.5, 'ro', label='Theoretical estimation')
plt.legend();

../_images/pandas_examples_04_binary_design_24_0.png

For small numbers of iterations interval parameter estimation is quite noisy, and one should be aware of it.

Experiment design based on a known retention rate value¶

In some cases, complete data on a binary metric is missing or not needed.

These can be known and pre-calculated conversion/retention values, or simply the absence of any historical data (in which case, for example, assumption for rates are needed).

And now we will calculate the experimental parameters using the known retantion rate value of 0.1.

[13]:

retention = 0.1

[14]:

# Create grid of MDEs and group sizes
# I and II type errros will have default values
effects = [1.01, 1.03, 1.05]
group_sizes = [20_000, 50_000, 100_000]

Design group sizes

[15]:

design_binary(to_design='size',
              prob_a=retention,
              method='theory',
              effects=effects)

[15]:

Errors ($\alpha$, $\beta$)	(0.05; 0.2)
Effect
1.0%	1419062
3.0%	159059
5.0%	57756

Design MDE values

[16]:

design_binary(to_design='effect',
              prob_a=retention,
              method='theory',
              sizes=group_sizes)

[16]:

Errors ($\alpha$, $\beta$)	(0.05; 0.2)
Group sizes
20000	8.6%
50000	5.4%
100000	3.8%

Design test power

[17]:

design_binary(to_design='power',
              prob_a=retention,
              method='theory',
              effects=effects,
              sizes=group_sizes)

[17]:

	Group sizes	20000	50000	100000
$\alpha$	Effect
0.05	1.0%	6.3%	8.2%	11.5%
	3.0%	16.8%	34.9%	60.3%
	5.0%	37.8%	74.1%	95.8%

Learn more¶

You can learn more information about how you can do A/B test design using Ambrosia

Check:

Binary design tools documentation
Main example of an experiment design using Designer class