UniformEncoder
Last updated
Last updated
Compatibility: categorical
or boolean
data
The UniformEncoder
transforms data that represents categorical values into a uniform distribution in the [0,1]
interval. It is highly accurate at preserving the overall frequencies of each category.
order_by
: Apply a prescribed ordering scheme. Use this if the discrete categorical values have an order.
The transformer assigns each category to a unique, non-overlapping subset of the [0,1]
interval. The length of the interval is based on the category's frequency. For example if category 'CASH'
occurs with 60% frequency, the subset will have the length 0.6
such as [0.2, 0.8]
.
After fitting the transformer, you can access the learned values through the attributes.
frequencies
: A dictionary that maps each category value to the observed frequency, as a float between 0 and 1
intervals
: A dictionary that maps each category value to an interval between [0,1]
. This allows you to determine the exact rules used for transforming and reverse transforming.
(default) None
Do not apply a particular order
'numerical_value'
If the data is represented by integers or floats, order by those values
'alphabetical'
If the data is represented by strings, order them alphabetically.