* OutlierEncoder

*SDV Enterprise Feature. This feature is available to our licensed users and is not currently in our public library. To learn more about the SDV Enterprise and its extra features, get in touch with us.

Compatibility: numerical data

The OutlierEncoder identifies the outliers to the left and right of the main data, and encodes this information in a new column. Then, it removes the outliers from the original column to make it easier for future data science use.

from rdt.transformers.numerical import OutlierEncoder

transformer = OutlierEncoder()

Parameters

distribution: The transformer approximates the shape (aka distribution) of the main values as well as the outliers. Use this parameter to specify the shape.

(default) 'uniform'

Estimate the main values and outliers as uniform distributions

'truncnorm'

Estimate the main values and outliers using a truncated Gaussian distribution.

Attributes

After fitting the transformer, you can access the learned values through the attributes.

box_plot_summary: A dictionary that stores the min, max and quartile values for the overall column

>>> transformer.box_plot_summary
{
  'min': 0.0,
  'Q1': 5.0,
  'Q2': 10.50
  'Q3': 25.0,
  'max': 10000.0
}

iqr: A float that represents the Interquartile Range

>>> transformer.iqr
20.0

outlier_ranges: A dictionary that maps 'left_outliers' to the left outlier ranges and 'right_outliers' to the right outlier range. These may be None if there are no outliers.

>>> transformer.outlier_ranges
{
  'left_outliers': None,
  'right_outliers': [55.0, 10000.0]
}

learned_distributions: A dictionary that maps 'left_outliers', 'main' and 'right_outliers' to the learned distribution for each area. These may be None if there are no values in the area.

>>> my_transformer.learned_distributions
{
  'LEFT_OUTLIER': None,
  'MAIN': { 
    'distribution': 'uniform',
    'learned_parameters': { 'scale': 1.2, 'loc': 25.0 },
  },
  'RIGHT_OUTLIER': { 
    'distribution': 'uniform',
    'learned_parameters': { 'scale': 1.2, 'loc': 40.0 }
  }
}

FAQs

When should I use this transformer?

This transformer is designed for numerical columns that contain outliers. The outliers may be on the left, right or both sides.

Will I see outliers again when I reverse transform the data?

Yes! If the initial data had outliers, the transformer will recreate outliers when reverse transforming the data. The outlier values it generates are estimates based on the learned parameters.

Last updated