Categorical
Last updated
Last updated
Categorical columns contain distinct categories. The defining aspect of a categorical column is that there is a set number of predefined categories.
The categories may be ordered or unordered. Ordered categories are known as ordinal while unordered categories are known as nominal.
These transformers encode your categorical values as numerical values, ready for data science and machine learning.
These transformers use differential privacy techniques to add noise or reshape your column of categorical data. As a result, your column — and any statistics about it — can be shared with differential privacy guarantees.
Encode categories as numerical labels.
Encode categories as numerical labels using a pre-determined order.
Encode categories as a smooth distribution. Useful for imbalanced values.
Encode categories as a smooth distribution using a pre-determined order. Useful for imbalanced values.
Encode categories into a multi-modal distribution by using a frequency-based analysis.
Encode categories into multiple, binary columns using one hot encoding.
Encode boolean data as binary 0/1
labels.
Normalize the data by computing CDF function and adding noise.
Privatize the data by adding Laplacian noise randomly to each category.
Privatize the data by adding Laplacian noise in a weighted way, based on category frequencies.
❖
❖
❖
"VISA"
, "AMEX"
or "DISCOVER"
. This is nominal because the categories don't have any order.