OrderedLabelEncoder

Compatibility: categorical data (ordinal)

The OrderedLabelEncoder transforms data that represents ordered categorical values into integers 0, 1, 2, etc. corresponding to each category in the correct order.

from rdt.transformers.categorical import OrderedLabelEncoder
ole = OrderedLabelEncoder(order=['strongly_disagree', 'disagree', 'neutral',
                                 'agree', 'strongly_agree'])

Parameters

(required) order: Apply a specific order to the values before assigning the labels

[list <value>]

An ordered list of the categories that appear in the real data. The first category in the list will be assigned a label of 0, the second will be assigned 1, etc. All possible categories must be defined in this list.

add_noise: Add noise to the label values

(default) False

Do not not add noise. Each time a category appears, it will always be transformed to the same label value.

True

Add noise. A category will be transformed to the same label with some noise added. For example instead of the label 1, values might be noised to 1.001, 1.456, 1.999, etc.

Examples

from transformers.categorical import OrderedLabelEncoder

# order the categories before assigning label values
# and then add noise to the labels
ole = OrderedLabelEncoder(order=['strongly_disagree', 'disagree', 'neutral',
                                 'agree', 'strongly_agree'],
                          add_noise=True)

FAQs

What if my categorical column does not have an order?

This transformer is only defined for ordinal categorical data. If there is no order, your data is nominal. Use the LabelEncoder instead.

What happens to missing values?

If there are missing values in your data, they should be defined as part of your order. Use the None keyword to denote a missing value.

In the example below, the missing value is added as the last item.

ole = OrderedLabelEncoder(order=['strongly_disagree', 'disagree',
                                 'neutral', 'agree', 'strongly_agree', None])

Add the missing value to whatever ordering position makes sense for your data. If you are unsure, consider adding it to the beginning or the end of the list.

When should I add noise?

If you do not add noise, the transformer will convert each category to a distinct label. For example 'disagree' is always converted to the label 1. If you add noise, the transformer will generate some random variation so the numbers are not distinct. For example 'disagree' may sometimes be 1.001 and other times be 1.999-- but always in the interval[1, 2).

Adding noise creates a continuous distribution. Your decision to add noise is dependent on your use of the data. If you are using the data for machine learning (ML), consider whether the techniques you plan to use work better on continuous distributions.

Last updated