LogoLogo
GitHubSlackDataCebo
  • RDT: Reversible Data Transforms
  • Getting Started
    • Installation
    • Quickstart
  • Usage
    • Basic Concepts
    • HyperTransformer
      • Preparation
      • Configuration
      • Transformation
  • Transformers Glossary
    • Numerical
      • ClusterBasedNormalizer
      • FloatFormatter
      • GaussianNormalizer
      • LogScaler
      • LogitScaler
      • * OutlierEncoder
      • ❖ DPECDFNormalizer
      • ❖ DPLaplaceNoiser
      • ❖ ECDFNormalizer
      • ❖ XGaussianNormalizer
    • Categorical
      • LabelEncoder
      • OrderedLabelEncoder
      • FrequencyEncoder
      • OneHotEncoder
      • OrderedUniformEncoder
      • UniformEncoder
      • BinaryEncoder
      • ❖ DPDiscreteECDFNormalizer
      • ❖ DPResponseRandomizer
      • ❖ DPWeightedResponseRandomizer
    • Datetime
      • OptimizedTimestampEncoder
      • UnixTimestampEncoder
      • ❖ DPTimestampLaplaceNoiser
    • ID
      • AnonymizedFaker
      • IndexGenerator
      • RegexGenerator
      • Treat IDs as categorical labels
    • Generic PII Anonymization
      • AnonymizedFaker
      • PseudoAnonymizedFaker
    • * Deep Data Understanding
      • * Address
        • * RandomLocationGenerator
        • * RegionalAnonymizer
      • * Email
        • * DomainBasedAnonymizer
        • * DomainBasedMapper
        • * DomainExtractor
      • * GPS Coordinates
        • * RandomLocationGenerator
        • * GPSNoiser
        • * MetroAreaAnonymizer
      • * Phone Number
        • * AnonymizedGeoExtractor
        • * NewNumberMapper
        • * GeoExtractor
  • Resources
    • Use Cases
      • Contextual Anonymization
      • Differential Privacy
      • Statistical Preprocessing
    • For Businesses
    • For Developers
Powered by GitBook
On this page
  • General Categorical Transformers
  • Differential Privacy Transformers
  1. Transformers Glossary

Categorical

Previous❖ XGaussianNormalizerNextLabelEncoder

Last updated 22 days ago

Categorical columns contain distinct categories. The defining aspect of a categorical column is that there is a set number of predefined categories.

The categories may be ordered or unordered. Ordered categories are known as ordinal while unordered categories are known as nominal.

General Categorical Transformers

These transformers encode your categorical values as numerical values, ready for data science and machine learning.

Differential Privacy Transformers

These transformers use differential privacy techniques to add noise or reshape your column of categorical data. As a result, your column — and any statistics about it — can be shared with differential privacy guarantees.

Encode categories as numerical labels.

Encode categories as numerical labels using a pre-determined order.

Encode categories as a smooth distribution. Useful for imbalanced values.

Encode categories as a smooth distribution using a pre-determined order. Useful for imbalanced values.

Encode categories into a multi-modal distribution by using a frequency-based analysis.

Encode categories into multiple, binary columns using one hot encoding.

Encode boolean data as binary 0/1 labels.

Normalize the data by computing CDF function and adding noise.

Privatize the data by adding Laplacian noise randomly to each category.

Privatize the data by adding Laplacian noise in a weighted way, based on category frequencies.

❖

❖

❖

LabelEncoder
OrderedLabelEncoder
UniformEncoder
OrderedUniformEncoder
FrequencyEncoder
OneHotEncoder
BinaryEncoder
DPDiscreteECDFNormalizer
DPResponseRandomizer
DPWeightedResponseRandomizer
For example, you may be recording the credit card company of your users. This can only take on specific values like "VISA", "AMEX" or "DISCOVER". This is nominal because the categories don't have any order.

❖ SDV Enterprise Bundle. This feature is available as part of the Differential Privacy Bundle, an optional add-on to SDV Enterprise. For more information, please visit the page. Coming soon!

Differential Privacy Bundle