Aggregation

AggregatePreprocessor

Preprocessing class for data aggregation.

class ambrosia.preprocessing.AggregatePreprocessor(categorial_method='mode', real_method='sum')[source]

Preprocessing class for data aggregation.

Can group data by multiple columns and aggregate it using methods for real and categorial features.

Parameters:
categorial_methodtypes.MethodType, default: "mode"

Aggregation method for categorial variables that will become as a default behavior.

real_methodtypes.MethodType, default: "sum"

Aggregation method for real variables that will become as a default behavior.

Attributes:
categorial_methodtypes.MethodType

Default aggregation method for categorial variables.

real_methodtypes.MethodType

Default aggregation method for real variables.

groupby_columnstypes.ColumnNamesType

Columns which were used for groupping in the last aggregation. Gets value after fitting the class instance.

agg_paramsDict

Dictionary with aggregation rules which was used in the last aggregation. Gets value after fitting the class instance.

get_params_dict()[source]

Returns dictionary with parameters of the last run() or transform() call.

fit(dataframe, groupby_columns, agg_params=None, real_cols=None, categorial_cols=None)[source]

Fit preprocessor with parameters of aggregation.

Aggregation will be performed using passed dictionary with defined aggregation conditions for each columns of interest, or lists of columns with default class aggregation behavior.

Parameters:
dataframepd.DataFrame

Table with selected columns.

groupby_columnstypes.ColumnNamesType

Columns for GROUP BY.

agg_paramsDict, optional

Dictionary with aggregation parameters.

real_colstypes.ColumnNamesType, optional

Columns with real metrics. Overriden by agg_params parameter and could be passed if expected default aggregation behavior.

categorial_colstypes.ColumnNamesType, optional

Columns with categorial metrics Overriden by agg_params parameter and could be passed if expected default aggregation behavior.

Returns:
selfobject

Instance object.

transform(dataframe)[source]

Apply table transformation by its aggregation with prefitted parameters.

Parameters:
dataframepd.DataFrame

Table to aggregate.

Returns:
agg_tablepd.DataFrame

Aggregated table.