* OutlierEncoder
*SDV Enterprise Feature. This feature is available to our licensed users and is not currently in our public library. To learn more about the SDV Enterprise and its extra features, get in touch with us.
Compatibility:
numerical
dataThe
OutlierEncoder
identifies the outliers to the left and right of the main data, and encodes this information in a new column. Then, it removes the outliers from the original column to make it easier for future data science use.
from rdt.transformers.numerical import OutlierEncoder
transformer = OutlierEncoder()
distribution
: The transformer approximates the shape (aka distribution) of the main values as well as the outliers. Use this parameter to specify the shape.(default) 'uniform' | Estimate the main values and outliers as uniform distributions |
'truncnorm' | Estimate the main values and outliers using a truncated Gaussian distribution. |
After fitting the transformer, you can access the learned values through the attributes.
box_plot_summary
: A dictionary that stores the min, max and quartile values for the overall column>>> transformer.box_plot_summary
{
'min': 0.0,
'Q1': 5.0,
'Q2': 10.50
'Q3': 25.0,
'max': 10000.0
}
>>> transformer.iqr
20.0
outlier_ranges
: A dictionary that maps 'left_outliers'
to the left outlier ranges and 'right_outliers'
to the right outlier range. These may be None
if there are no outliers.>>> transformer.outlier_ranges
{
'left_outliers': None,
'right_outliers': [55.0, 10000.0]
}
learned_distributions
: A dictionary that maps 'left_outliers'
, 'main'
and 'right_outliers'
to the learned distribution for each area. These may be None
if there are no values in the area.>>> my_transformer.learned_distributions
{
'LEFT_OUTLIER': None,
'MAIN': {
'distribution': 'uniform',
'learned_parameters': { 'scale': 1.2, 'loc': 25.0 },
},
'RIGHT_OUTLIER': {
'distribution': 'uniform',
'learned_parameters': { 'scale': 1.2, 'loc': 40.0 }
}
}
Last modified 28d ago