Search…
⌃K
Links

FrequencyEncoder

Compatibility: categorical data
The FrequencyEncoder transforms data that represents unordered, categorical values into decimals in the range [0, 1]. This range is broken up into separate intervals for each category -- more popular categories take up larger intervals.
from rdt.transformers.categorical import FrequencyEncoder
fre = FrequencyEncoder()

Parameters

add_noise: Add noise when transforming a category into the [0, 1] interval.
(default) False
Do not add noise. Each time a category appears, it will always be transformed to the same value.
True
Add noise. A category may be transformed to different values every time it appears (but it will always stay within the interval).

Examples

from transformers.categorical import FrequencyEncoder
# add some noise to the chosen intervals
fe = FrequencyEncoder(add_noise=True)

FAQs

This transformer treats missing values as if they are a new category of data.
If you do not add noise, the transformer will convert each category to a distinct number. For example VISA is always converted to the value 0.2. If you add noise, the transformer will generate some random variation so the numbers are not distinct. For example VISA may sometimes be 0.19 and other times be 0.21. Adding noise creates a rounded, continuous distribution.
Your decision to add noise is dependent on your use of the data. If you are using the data for machine learning (ML), consider whether the techniques you plan to use work better on continuous distributions. If so, consider adding noise.