* OutlierEncoder
Compatibility: numerical
data
The OutlierEncoder
identifies the outliers to the left and right of the main data, and encodes this information in a new column. Then, it removes the outliers from the original column to make it easier for future data science use.

from rdt.transformers.numerical import OutlierEncoder
transformer = OutlierEncoder()
Parameters
distribution
: The transformer approximates the shape (aka distribution) of the main values as well as the outliers. Use this parameter to specify the shape.
(default) 'uniform'
Estimate the main values and outliers as uniform distributions
'truncnorm'
Estimate the main values and outliers using a truncated Gaussian distribution.
Attributes
After fitting the transformer, you can access the learned values through the attributes.
box_plot_summary
: A dictionary that stores the min, max and quartile values for the overall column
>>> transformer.box_plot_summary
{
'min': 0.0,
'Q1': 5.0,
'Q2': 10.50
'Q3': 25.0,
'max': 10000.0
}
iqr
: A float that represents the Interquartile Range
>>> transformer.iqr
20.0
outlier_ranges
: A dictionary that maps 'left_outliers'
to the left outlier ranges and 'right_outliers'
to the right outlier range. These may be None
if there are no outliers.
>>> transformer.outlier_ranges
{
'left_outliers': None,
'right_outliers': [55.0, 10000.0]
}
learned_distributions
: A dictionary that maps 'left_outliers'
, 'main'
and 'right_outliers'
to the learned distribution for each area. These may be None
if there are no values in the area.
>>> my_transformer.learned_distributions
{
'LEFT_OUTLIER': None,
'MAIN': {
'distribution': 'uniform',
'learned_parameters': { 'scale': 1.2, 'loc': 25.0 },
},
'RIGHT_OUTLIER': {
'distribution': 'uniform',
'learned_parameters': { 'scale': 1.2, 'loc': 40.0 }
}
}
FAQs
Last updated