UniformEncoder

Compatibility: categorical or boolean data

The UniformEncoder transforms data that represents categorical values into a uniform distribution in the [0,1] interval. It is highly accurate at preserving the overall frequencies of each category.

from rdt.transformers.categorical import UniformEncoder

transformer = UniformEncoder()

Parameters

order_by: Apply a prescribed ordering scheme. Use this if the discrete categorical values have an order.

(default) None

Do not apply a particular order

'numerical_value'

If the data is represented by integers or floats, order by those values

'alphabetical'

If the data is represented by strings, order them alphabetically.

Examples

The transformer assigns each category to a unique, non-overlapping subset of the [0,1] interval. The length of the interval is based on the category's frequency. For example if category 'CASH' occurs with 60% frequency, the subset will have the length 0.6 such as [0.2, 0.8].

Attributes

After fitting the transformer, you can access the learned values through the attributes.

frequencies: A dictionary that maps each category value to the observed frequency, as a float between 0 and 1

intervals: A dictionary that maps each category value to an interval between [0,1]. This allows you to determine the exact rules used for transforming and reverse transforming.

FAQs

chevron-rightWhen should I use this transformer?hashtag

The UniformEncoder is shown to preserve the frequency of each category value with high accuracy. This is especially useful if you have a data imbalance, for example if True occurs only 1% of the time while False occurs 99% of the time.

chevron-rightWhen should I use the order_by parameter?hashtag

Use this parameter when the categorical data is ordinal (has a specific order) and the order can easily be discovered through sorting. For example, you might storing survey responses as 'response_00', 'response_01', 'response_02', etc.

Don't add this parameter if it isn't necessary. Ordering increases the time it takes for transformation.

chevron-rightWhat if I'd like to sort the values by a custom order?hashtag

In some cases, your categories may not have an alphanumeric ordering scheme. Use the OrderedUniformEncoder to add your own, custom sorting order.

chevron-rightWhat happens to missing values?hashtag

This transformer treats missing values as if they are a new category of data.

Last updated