LogoLogo
GitHubSlackDataCebo
  • RDT: Reversible Data Transforms
  • Getting Started
    • Installation
    • Quickstart
  • Usage
    • Basic Concepts
    • HyperTransformer
      • Preparation
      • Configuration
      • Transformation
  • Transformers Glossary
    • Numerical
      • ClusterBasedNormalizer
      • FloatFormatter
      • GaussianNormalizer
      • LogScaler
      • LogitScaler
      • * OutlierEncoder
      • ❖ DPECDFNormalizer
      • ❖ DPLaplaceNoiser
      • ❖ ECDFNormalizer
      • ❖ XGaussianNormalizer
    • Categorical
      • LabelEncoder
      • OrderedLabelEncoder
      • FrequencyEncoder
      • OneHotEncoder
      • OrderedUniformEncoder
      • UniformEncoder
      • BinaryEncoder
      • ❖ DPDiscreteECDFNormalizer
      • ❖ DPResponseRandomizer
      • ❖ DPWeightedResponseRandomizer
    • Datetime
      • OptimizedTimestampEncoder
      • UnixTimestampEncoder
      • ❖ DPTimestampLaplaceNoiser
    • ID
      • AnonymizedFaker
      • IndexGenerator
      • RegexGenerator
      • Treat IDs as categorical labels
    • Generic PII Anonymization
      • AnonymizedFaker
      • PseudoAnonymizedFaker
    • * Deep Data Understanding
      • * Address
        • * RandomLocationGenerator
        • * RegionalAnonymizer
      • * Email
        • * DomainBasedAnonymizer
        • * DomainBasedMapper
        • * DomainExtractor
      • * GPS Coordinates
        • * RandomLocationGenerator
        • * GPSNoiser
        • * MetroAreaAnonymizer
      • * Phone Number
        • * AnonymizedGeoExtractor
        • * NewNumberMapper
        • * GeoExtractor
  • Resources
    • Use Cases
      • Contextual Anonymization
      • Differential Privacy
      • Statistical Preprocessing
    • For Businesses
    • For Developers
Powered by GitBook
On this page
  • More than a formatting library
  • Statistical Processing
  • Anonymizing Sensitive Data
  • Extracting Deeper Meaning
  • Use Cases
  • Owned & Maintained by DataCebo

RDT: Reversible Data Transforms

NextInstallation

Last updated 16 days ago

How much effort are you spending in cleaning and processing your data?

RDT (Reversible Data Transforms) is a that translates between real world data and cleaned, numerical data that's ready for data science.

More than a formatting library

Cleaning and formatting raw data is a foundational element of RDT. But you can use the library to do much more.

Normalize your data using statistical processes. This is especially useful for data science and machine learning projects.

Protect sensitive data while preserving the overall data format. Using RDTs, you can remove and anonymize Personal Identifiable Information. Use it to generate random, fake values that look like the original ones.

Licensed users can extract deeper concepts that are embedded inside the data. This is particularly useful for complex data types that have a rich, real-world meaning.

Use Cases

We created RDTs with the goal of generating synthetic data. The RDT library transforms the raw data for machine learning, and then reverse transforms machine-generated data to match the original. Synthetic data remains a top use case for RDT today.

Owned & Maintained by DataCebo

Statistical Processing

Anonymizing Sensitive Data

Extracting Deeper Meaning

If you'd like to use RDT for synthetic data, we recommend installing the . It will automatically download RDT, along with other libraries to support synthetic data generation & evaluation.

RDT can be useful beyond the synthetic data space. You can use RDT for statistical preprocessing, contextual anonymization, or adding differential privacy as standalone projects. For more information, see .

The RDT library is a part of the , first created at MIT's in 2016. After 4 years of research and traction with enterprise, we created DataCebo in 2020 with the goal of growing the project.

Today, is the proud developer of the SDV, the largest ecosystem for synthetic data generation & evaluation.

🔢
🔓
💎
sdv library
Use Cases
Synthetic Data Vault Project
Data to AI Lab
DataCebo
Python library