Outliers removal¶
Unit for simple robust transformation for avoiding outliers in data. |
|
Unit for IQR transformation of the data to exclude outliers. |
- class ambrosia.preprocessing.RobustPreprocessor(verbose=True)[source]¶
Unit for simple robust transformation for avoiding outliers in data.
It cuts the alpha percentage of distribution from head, tail or both sides for each given metric. The data distribution structure assumed to present as small alpha part of outliers, followed by the normal part of the data with another alpha part of outliers at the end of the distribution.
- Parameters:
- verbosebool, default:
True If
Truewill show info about the transformation of passed columns.
- verbosebool, default:
- Attributes:
- paramsDict
Dictionary with operational parameters of the instance. Updated after calling the
fitmethod.- verbosebool
Verbose info flag.
- available_tailsList
List of the available tail type names to preprocess
- non_serializable_params: List
List of the class parameters that should be converted to lists in order to serialize.
- fittedbool
Fit flag.
Examples
>>> robust = RobustPreprocessor(verbose=True) >>> robust.fit(dataframe, ['column1', 'column2'], alpha=0.05) >>> robust.transform(dataframe, inplace=True)
You can pass one or number of columns, if several columns are passed it will drop in total alpha percent of extreme values for each column.
- fit(dataframe, column_names, alpha=0.05, tail='both')[source]¶
Fit to calculate robust parameters for the selected columns.
- Parameters:
- dataframepd.DataFrame
Dataframe to calculate quantiles.
- column_namesColumnNamesType
One or number of columns in the dataframe.
- alphaUnion[float, np.ndarray], default:
0.05 The percentage of removed data from head and tail.
- tailstr, default:
"both" Part of distribution to be removed. Can be
"left","right"or"both".
- Returns:
- selfobject
Instance object.
- transform(dataframe, inplace=False)[source]¶
Remove objects from the dataframe which are in the head, tail or both alpha parts of chosen metrics distributions.
- Parameters:
- dataframepd.DataFrame
Dataframe to transform.
- inplacebool, default:
False If
Truetransforms the given dataframe, otherwise copy and returns an another one.
- Returns:
- dfUnion[pd.DataFrame, None]
Transformed dataframe or None
- fit_transform(dataframe, column_names, alpha=0.05, tail='both', inplace=False)[source]¶
Fit preprocessor parameters using given dataframe and transform it.
- Parameters:
- dataframepd.DataFrame
Dataframe to calculate quantiles and for further transformation.
- column_namesColumnNamesType
One or number of columns in the dataframe.
- alphaUnion[float, np.ndarray], default:
0.05 The percentage of removed data from head and tail.
- tailstr, default:
"both" Part of distribution to be removed. Can be
"left","right"or"both".- inplacebool, default:
False If
Truetransforms the given dataframe, otherwise copy and returns an another one.
- Returns:
- dfUnion[pd.DataFrame, None]
Transformed dataframe or None
- store_params(store_path)¶
- Parameters:
- store_pathPath
Path where parameters will be stored in a json format.
- load_params(load_path)¶
- Parameters:
- load_pathPath
Path to json file with parameters.
- class ambrosia.preprocessing.IQRPreprocessor(verbose=True)[source]¶
Unit for IQR transformation of the data to exclude outliers.
It cuts the points from the distribution which are behind the range of 0.25 quantile - 1,5 * iqr and 0.75 quantile + 1,5 * iqr for each given metric.
- Parameters:
- verbosebool, default:
True If
Truewill show info about the transformation of passed columns.
- verbosebool, default:
- Attributes:
- paramsDict
Dictionary with operational parameters of the instance. Updated after calling the
fitmethod.- verbosebool
Verbose info flag.
- non_serializable_params: List
List of the class parameters that should be converted to lists in order to serialize.
- fittedbool
Fit flag.
Examples
>>> iqr = IQRPreprocessor(verbose=True) >>> iqr.fit(dataframe, ['column1', 'column2']) >>> iqr.transform(dataframe, inplace=True)
You can pass one or number of columns, if several columns are passed it will drop extreme values for each column.
- fit(dataframe, column_names)[source]¶
Fit to calculate iqr parameters for the selected columns.
- Parameters:
- dataframepd.DataFrame
Dataframe to calculate quantiles.
- column_namesColumnNamesType
One or number of columns in the dataframe.
- Returns:
- selfobject
Instance object.
- transform(dataframe, inplace=False)[source]¶
Remove objects from the dataframe which are behind maximum and minimum values of boxplots for each metric distribution.
- Parameters:
- dataframepd.DataFrame
Dataframe to transform.
- inplacebool, default:
False If
Truetransforms the given dataframe, otherwise copy and returns an another one.
- Returns:
- dfUnion[pd.DataFrame, None]
Transformed dataframe or None
- fit_transform(dataframe, column_names, inplace=False)[source]¶
Fit preprocessor parameters using given dataframe and transform it.
- Parameters:
- dataframepd.DataFrame
Dataframe to calculate quantiles and for further transformation.
- column_namesColumnNamesType
One or number of columns in the dataframe.
- inplacebool, default:
False If
Truetransforms the given dataframe, otherwise copy and returns an another one.
- Returns:
- dfUnion[pd.DataFrame, None]
Transformed dataframe or None
- store_params(store_path)¶
- Parameters:
- store_pathPath
Path where parameters will be stored in a json format.
- load_params(load_path)¶
- Parameters:
- load_pathPath
Path to json file with parameters.