UniformEncoder
Last updated
Last updated
Compatibility: categorical
or boolean
data
The UniformEncoder
transforms data that represents categorical values into a uniform distribution in the [0,1]
interval. It is highly accurate at preserving the overall frequencies of each category.
order_by
: Apply a prescribed ordering scheme. Use this if the discrete categorical values have an order.
(default) None
Do not apply a particular order
'numerical_value'
If the data is represented by integers or floats, order by those values
'alphabetical'
If the data is represented by strings, order them alphabetically.
The transformer assigns each category to a unique, non-overlapping subset of the [0,1]
interval. The length of the interval is based on the category's frequency. For example if category 'CASH'
occurs with 60% frequency, the subset will have the length 0.6
such as [0.2, 0.8]
.
After fitting the transformer, you can access the learned values through the attributes.
frequencies
: A dictionary that maps each category value to the observed frequency, as a float between 0 and 1
intervals
: A dictionary that maps each category value to an interval between [0,1]
. This allows you to determine the exact rules used for transforming and reverse transforming.
In some cases, your categories may not have an alphanumeric ordering scheme. Use the to add your own, custom sorting order.