UniformEncoder
Compatibility:
categorical
or boolean
dataThe
UniformEncoder
transforms data that represents categorical values into a uniform distribution in the [0,1]
interval. It is highly accurate at preserving the overall frequencies of each category.
from rdt.transformers.categorical import UniformEncoder
transformer = UniformEncoder()
order_by
: Apply a prescribed ordering scheme. Use this if the discrete categorical values have an order.(default) None | Do not apply a particular order |
'numerical_value' | If the data is represented by integers or floats, order by those values |
'alphabetical' | If the data is represented by strings, order them alphabetically. |
from rdt.transformers.categorical import UniformEncoder
transformer = UniformEncoder(
order_by='alphabetical'
)
The transformer assigns each category to a unique, non-overlapping subset of the
[0,1]
interval. The length of the interval is based on the category's frequency. For example if category 'CASH'
occurs with 60% frequency, the subset will have the length 0.6
such as [0.2, 0.8]
.After fitting the transformer, you can access the learned values through the attributes.
frequencies
: A dictionary that maps each category value to the observed frequency, as a float between 0 and 1>>> transformer.frequencies
{
'CREDIT': 0.2,
'CASH': 0.6,
'DEBIT': 0.2
}
intervals
: A dictionary that maps each category value to an interval between [0,1]
. This allows you to determine the exact rules used for transforming and reverse transforming.>>> transformer.intervals
{
'CREDIT': [0, 0.2],
'CASH': [0.2, 0.8],
'DEBIT': [0.8, 1.0]
}
Use this parameter when the categorical data is ordinal (has a specific order) and the order can easily be discovered through sorting. For example, you might storing survey responses as
'response_00'
, 'response_01'
, 'response_02'
, etc.Don't add this parameter if it isn't necessary. Ordering increases the time it takes for transformation.
In some cases, your categories may not have an alphanumeric ordering scheme. Use the OrderedUniformEncoder to add your own, custom sorting order.
Last modified 3mo ago