OrderedUniformEncoder
Compatibility:
categorical
or boolean
dataThe
OrderedUniformEncoder
transforms data that represents ordered categorical values into a uniform distribution in the [0,1]
interval. It preserves the frequencies of each category with high accuracy.
from rdt.transformers.categorical import OrderedUniformEncoder
transformer = OrderedUniformEncoder(order=['STRONGLY DISAGREE', 'DISAGREE', 'NEUTRAL',
'AGREE', 'STRONGLY AGREE'])
(required)
order
: Specify an order to the category values[list <value>] | An ordered list of the categories that appear in the real data |
from rdt.transformers.categorical import OrderedUniformEncoder
transformer = OrderedUniformEncoder(order=['STRONGLY DISAGREE', 'DISAGREE', 'NEUTRAL',
'AGREE', 'STRONGLY AGREE'])
The transformer assigns each category to a unique, non-overlapping subset of the
[0,1]
interval. The order of the intervals is based on your custom order. The length of the interval is based on the category's frequency. For example if category 'AGREE'
occurs with 20% frequency, the subset will have the length 0.2
such as [0.5, 0.7]
.After fitting the transformer, you can access the learned values through the attributes.
frequencies
: A dictionary that maps each category value to the observed frequency, as a float between 0 and 1>>> transformer.frequencies
{
'STRONGLY DISAGREE': 0.1,
'DISAGREE': 0.2,
'NEUTRAL': 0.2,
'AGREE': 0.2,
'STRONGLY AGREE': 0.3
}
intervals
: A dictionary that maps each category value to an interval between [0,1]
. This allows you to determine the exact rules used for transforming and reverse transforming.>>> transformer.intervals
{
'STRONGLY DISAGREE': [0, 0.1],
'DISAGREE': [0.1, 0.3],
'NEUTRAL': [0.3, 0.5],
'AGREE': [0.5, 0.7],
'STRONGLY AGREE': [0.7, 1.0]
}
This transformer is only defined for ordinal categorical data. If there is no order, your data is nominal. Use the UniformEncoder instead.
If there are missing values in your data, they should be defined as part of your order. Use the
None
keyword to denote a missing value.In the example below, the missing value is added as the last item.
OrderedUniformEncoder(order=['STRONGLY DISAGREE', 'DISAGREE',
'NEUTRAL', 'AGREE', 'STRONGLY AGREE',
None])
Add the missing value to whatever ordering position makes sense for your data. If you are unsure, consider adding it to the beginning or the end of the list.
Last modified 3mo ago