UniformEncoder
Compatibility: categorical or boolean data
The UniformEncoder transforms data that represents categorical values into a uniform distribution in the [0,1] interval. It is highly accurate at preserving the overall frequencies of each category.

from rdt.transformers.categorical import UniformEncoder
transformer = UniformEncoder()Parameters
order_by: Apply a prescribed ordering scheme. Use this if the discrete categorical values have an order.
(default) None
Do not apply a particular order
'numerical_value'
If the data is represented by integers or floats, order by those values
'alphabetical'
If the data is represented by strings, order them alphabetically.
Examples
The transformer assigns each category to a unique, non-overlapping subset of the [0,1] interval. The length of the interval is based on the category's frequency. For example if category 'CASH' occurs with 60% frequency, the subset will have the length 0.6 such as [0.2, 0.8].
Attributes
After fitting the transformer, you can access the learned values through the attributes.
frequencies: A dictionary that maps each category value to the observed frequency, as a float between 0 and 1
intervals: A dictionary that maps each category value to an interval between [0,1]. This allows you to determine the exact rules used for transforming and reverse transforming.
FAQs
When should I use this transformer?
The UniformEncoder is shown to preserve the frequency of each category value with high accuracy. This is especially useful if you have a data imbalance, for example if True occurs only 1% of the time while False occurs 99% of the time.
When should I use the order_by parameter?
Use this parameter when the categorical data is ordinal (has a specific order) and the order can easily be discovered through sorting. For example, you might storing survey responses as 'response_00', 'response_01', 'response_02', etc.
Don't add this parameter if it isn't necessary. Ordering increases the time it takes for transformation.
What if I'd like to sort the values by a custom order?
In some cases, your categories may not have an alphanumeric ordering scheme. Use the OrderedUniformEncoder to add your own, custom sorting order.
Last updated
