Effect Measurement¶
Tools for assessing the statistical significance of completed experiments and calculating the experimental uplift value with corresponding confidence intervals.
Multiple testing correction
Currently, if multiple hypothesis(number of variants combinations * number of metrics passed) are tested, these groups are compared in pairs and Bonferroni correction is applied to all p-values and confidence intervals.
Unit for evaluating the results of experiments. |
|
Function wrapper around the |
- class ambrosia.tester.Tester(dataframe=None, df_mapping=None, experiment_results=None, column_groups=None, group_labels=None, id_column=None, first_type_errors=0.05, metrics=None, metric_funcs=None)[source]¶
Unit for evaluating the results of experiments.
- The experiment evaluation result contains:
Pvalue for the selected criterion
Point effect estimation
Corresponding confidence interval for the effect
Boolean result - presence / absence of the effect
- Parameters:
- dataframePassedDataType, optional
Dataframe used with experiment results metrics.
- df_mappingGroupsInfoType, optional
Dataframe which contains group labels of objects.
- experiment_resultsExperimentResults, optional
Dict with separate experiment results for each group. Dict keys are used as groups labels, values must be either pandas or Spark dataframes.
- column_groupsColumnNameType, optional
Column which contains groups label of objects.
- group_labelsGroupLabelsType, optional
Labels for experimental groups. If
column_groupscontains at least two values, they will choose for labels.- id_columnColumnNameType, optional
Name of column with objects ids in
df_mappingdataframe.- first_type_errorsStatErrorType, default:
0.05 I type errors values. Fix P (detect difference for equal) to be less than threshold. Used to construct confidence intervals.
- metricsMetricNameType, optional
Metrics (columns of dataframe) which is used to calculate experiment result.
- metric_funcsDict[str, Callable], optional
Dictionary mapping metric names to callable functions. Each function receives a
pd.DataFrame(group data) and must return an array-like of numeric values. When provided, the function is used instead of column lookup for the corresponding metric name. Only supported for pandas DataFrames.
- Attributes:
- dataframePassedDataType
Dataframe used with experiment results metrics.
- df_mappingGroupsInfoType
Dataframe which contains group labels of objects.
- experiment_resultsExperimentResults, optional
Dict with separate experiment results for each group.
- column_groupsColumnNameType
Column which contains groups label of objects.
- group_labelsGroupLabelsType
Labels for experimental groups.
- id_columnColumnNameType
Name of column with objects ids in
df_mappingdataframe.- first_type_errorsStatErrorType, default:
0.05 I type errors values.
- metricsMetricNameType
Columns of dataframe with experiment results.
Notes
Basic mathematic methods for evaluating experiments:
- Theory:
Absolute: Using ttest, mann-whitney, others and custom criteria
Relative: Using delta method
- Empiric:
Absolute / Relative: Building empirical distribution for T(A, B)
- Binary:
Absolute: Using special binary intervals and finding pvalue = inf_a {x : 0 not in interval(x)}
Relative: Not implemented yet :(
Constructors:
>>> # Empty constructor >>> tester = Tester() >>> # You can pass Iterable or single object for some parameters >>> tester = Tester( >>> dataframe=df, >>> columns_groups='groups', >>> metrics=['ltv', 'retention'] >>> ) >>> tester = Tester(metrics='retention', first_type_errors=[0.01, 0.05]) >>> # You can set a separate table containing information about >>> # the partitioning in the experiment >>> tester = tester = Tester( >>> dataframe=df, # main dataframe with metrics >>> df_mapping=groups, # table with information about groups >>> metrics='metric', # Metric to be tested >>> column_groups='group', # Column in df_mapping with labels >>> id_column='id' # Column with ids in df and df_mapping (for join) >>> )
Setters:
>>> tester.set_metrics(['ltv', 'retention']) >>> tester.set_dataframe(dataframe=dataframe, column_groups='groups') >>> # You can set separate data of each group packed in special dict form >>> tester.set_experiment_results(experiment_results=experiment_results)
Run:
>>> # You can choose effect_type to estimate: relative / absolute >>> tester.run('absolute') >>> # Also you can choose method >>> tester.run('absolute', method='empriric') # emipiric for bootstrap >>> # One can pass arguments in run() method and they will have >>> # higher priority >>> tester.run(metrics='ltv', data_a_group=df_a)
Use a function instead of a class:
>>> test('absolute', dataframe=df, column_groups='groups', metrics='ltv')
Examples
We’ve experimented with adding onboarding to our mobile app and would like to know about its results in terms of A/B testing. Suppose we have a loaded pandas dataframe with a column responsible for the groups in the testing and columns with metric values, such as retention. Then you can use the tester class the following way:
>>> tester = Tester( >>> dataframe=df, >>> column_groups='groups', >>> metrics='retention' >>> ) >>> tester.run() >>> # Output >>> [{ >>> 'first_type_error' : 0.05, >>> 'pvalue' : 0.03, >>> 'effect' : 1.05, >>> 'confidence_interval' : (1.01, 1.10), >>> 'metric name': 'retention', >>> 'group A label': 'A', >>> 'group B label': 'B' >>> }]
- run(effect_type='absolute', method='theory', dataframe=None, df_mapping=None, experiment_results=None, id_column=None, column_groups=None, group_labels=None, metrics=None, first_type_errors=None, criterion=None, correction_method='bonferroni', as_table=True, metric_funcs=None, **kwargs)[source]¶
The main method for testing and evaluating experimental results.
- Parameters:
- effect_typestr, default:
"absolute" Effect type to calculate. Could be
"absolute"or"relative".- methodstr, default:
"theory" Type of testing approach. Can take the values
"theory","empiric"or"binary".- dataframePassedDataType, optional
Data used to calculate the results of an experiment.
- df_mappingGroupsInfoType, optional
Dataframe which contains group labels of objects.
- experiment_resultsExperimentResults
Dict with separate experiment results for each group. Dict keys are used as groups labels, values must be either pandas or Spark dataframes.
- column_groupsColumnNameType
Column which contains groups label of objects.
- group_labelsGroupLabelsType
Labels for experimental groups.
- id_columnColumnNameType
Name of column with objects ids in
df_mappingdataframe.- first_type_errorsStatErrorType, default:
0.05 I type errors values.
- metricsMetricNameType
Columns of dataframe with experiment results.
- criterionABStatCriterion, optional
Statistical criterion for hypotheses testing. If
methodis"theory"and no criterion provided, ttest for independent samples will be used.- correction_methodUnion[str, None], default:
bonferroni Method for pvalues and confidence intervals multitest correction. Total number of hypothesis is equal to the number of variants combinations * number of metrics passed.
- as_tablebool, default:
True Return the test results as a pandas dataframe. If
False, a list of dicts with results will be returned.- metric_funcsDict[str, Callable], optional
Dictionary mapping metric names to callable functions. Each function receives a group
pd.DataFrameand returns array-like values. Overrides functions set in constructor for matching metric names. Only pandas DataFrames supported.- **kwargsDict
Other keyword arguments.
- effect_typestr, default:
- Returns:
- resulttypes.TesterResult
Experiment results as pandas table or list of dicts for each metric and first type error.
- ambrosia.tester.test(effect_type='absolute', method='theory', dataframe=None, df_mapping=None, experiment_results=None, id_column=None, column_groups=None, group_labels=None, metrics=None, first_type_errors=None, criterion=None, correction_method='bonferroni', as_table=True, metric_funcs=None, **kwargs)[source]¶
Function wrapper around the
Testerclass.Apply on the experimental data to get the results of an experiment.
Creates an instance of the
Testerclass internally and execute run method with corresponding arguments.- Parameters:
- effect_typestr, default:
"absolute" Effect type to calculate. Could be
"absolute"or"relative".- methodstr, default:
"theory" Type of testing approach. Can take the values
"theory","empiric"or"binary".- dataframePassedDataType, optional
Data used to calculate the results of an experiment.
- df_mappingGroupsInfoType, optional
Dataframe which contains group labels of objects.
- experiment_resultsExperimentResults
Dict with separate experiment results for each group. Dict keys are used as groups labels, values must be either pandas or Spark dataframes.
- column_groupsColumnNameType
Column which contains groups label of objects.
- group_labelsGroupLabelsType
Labels for experimental groups.
- id_columnColumnNameType
Name of column with objects ids in
df_mappingdataframe.- first_type_errorsStatErrorType, default:
0.05 I type errors values.
- metricsMetricNameType
Columns of dataframe with experiment results.
- criterionABStatCriterion, optional
Statistical criterion for hypotheses testing. If
methodis"theory"and no criterion provided, ttest for independent samples will be used.- correction_methodUnion[str, None], default:
bonferroni Method for pvalues and confidence intervals multitest correction. Total number of hypothesis is equal to the number of variants combinations * number of metrics passed.
- as_tablebool, default:
True Return the test results as a pandas dataframe. If
False, a list of dicts with results will be returned.- metric_funcsDict[str, Callable], optional
Dictionary mapping metric names to callable functions. Each function receives a group
pd.DataFrameand returns array-like values. Only pandas DataFrames supported.- **kwargsDict
Other keyword arguments.
- effect_typestr, default:
- Returns:
- resulttypes.TesterResult
Experiment results as pandas table or list of dicts for each metric and first type error.