Overview of Ambrosia Designer class Spark data support

This example shows the functionality of the Designer class on Spark DataFrames. Synthetic data on LTV and user retention rate is used.

The functionality of the Designer class on Spark data currently is limited compared to the pandas format.
In order to learn about the full functionality of the Designer and get information about why the design of A / B test parameters is needed and how it can be done, see the main tutorial on the Designer class.
[2]:
import os

import pandas as pd
import pyspark

from ambrosia.designer import Designer

Build local spark session

[3]:
os.environ['SPARK_LOCAL_IP'] = '127.0.0.1'
spark = pyspark.sql.SparkSession.builder.master("local[1]").getOrCreate()
spark.sparkContext.setLogLevel('ERROR')
Setting default log level to "WARN".
To adjust logging level use sc.setLogLevel(newLevel). For SparkR, use setLogLevel(newLevel).
23/04/21 17:39:34 WARN NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable

Create Spark DataFrame

[4]:
ltv_and_retention_dataset = pd.read_csv(
    "./../tests/test_data/ltv_retention.csv")
sdf = spark.createDataFrame(ltv_and_retention_dataset)
[5]:
sdf.printSchema()
root
 |-- LTV: double (nullable = true)
 |-- retention: double (nullable = true)

Spark A/B test parameters theoretical design

First, we will use a theoretical approach to find the missing parameters of a hypothetical experiment.
We will obtain theoretical estimates for the size of groups, MDE in the power of the test with the appropriate known parameters.

Create class instance and set grid of parameters, I and II type errors remain default

[6]:
designer = Designer(dataframe=sdf,
                    effects=[1.05, 1.2],
                    sizes=[100, 1000],
                    metrics='LTV')

Design groups size

[7]:
designer.run('size', 'theory')

[7]:
Errors ($\alpha$, $\beta$) (0.05; 0.2)
Effect
5.0% 6206
20.0% 389

Design minimal detectable effect

[8]:
designer.run('effect', 'theory')
[8]:
Errors ($\alpha$, $\beta$) (0.05; 0.2)
Group sizes
100 39.6%
1000 12.5%

Design test power

[9]:
designer.run('power', 'theory')
[9]:
Group sizes 100 1000
$\alpha$ Effect
0.05 5.0% 6.4% 20.3%
20.0% 29.4% 99.4%

Spark A/B test parameters empirical design

Now let’s calculate the parameters using multiple sampling of groups from the transmitted data and modeling a hypothetical effect.
This approach, with high value of bootstrap_size parameter (number of sampled groups per step), gives more accurate estimation of the parameters, but requires much more computational resources than the theoretical one.
[10]:
designer = Designer(dataframe=sdf,
                    second_type_errors=0.1,
                    effects=[1.1, 1.2],
                    sizes=[500, 2000],
                    metrics='LTV')

Currently, we don’t have criterion parameter which we implement for different statistical criteria in pandas data empirical design, here t-test criterion is always used.

[11]:
designer.run('size', 'empiric', bootstrap_size=20)
[11]:
errors (0.1, 0.05)
effect
20.0% 247
10.0% 1198
[12]:
designer.run('effect', 'empiric', bootstrap_size=20)
[12]:
errors (0.1, 0.05)
group_sizes
500 9.6%
2000 5.9%

Spark design for binary metrics

For binary metrics, "theory" or "binary" approaches can be used.
The first approach uses different approximations for binary data, while the latter calculates experimental parameters based on the constructed confidence intervals of various types.
[14]:
designer = Designer(dataframe=sdf,
                    second_type_errors=0.5,
                    sizes=150,
                    effects=1.2,
                    metrics='retention')

Group size:

[15]:
designer.run('size', 'theory')
[15]:
Errors ($\alpha$, $\beta$) (0.05; 0.5)
Effect
20.0% 289
[16]:
designer.run('size', 'binary', interval_type='newcombe', amount=100000)
[16]:
Errors ($\alpha$, $\beta$) (0.05; 0.5)
Effect
20.0% 280

Power:

[17]:
designer.run('power', 'theory')
[17]:
Group sizes 150
$\alpha$ Effect
0.05 20.0% 29.2%
[18]:
designer.run('power', 'binary', interval_type='newcombe', amount=100000)
[18]:
Group sizes 150
$\alpha$ Effect
0.05 20.0% 30.3%
[19]:
spark.stop()