RDT: Reversible Data Transforms
How much effort are you spending in cleaning and processing your data?
RDT (Reversible Data Transforms) is an open source Python library that translates between real world data and cleaned, numerical data that's ready for data science.

More than a formatting library

Cleaning and formatting raw data is a foundational element of RDT. But you can use the library to do much more.

​
πŸ”’
Statistical Processing

Normalize your data using statistical processes. This is especially useful for data science and machine learning projects.

​
πŸ”“
Anonymizing Sensitive Data

Protect sensitive data while preserving the overall data format. Using RDTs, you can remove and anonymize Personal Identifiable Information. Use it to generate random, fake values that look like the original ones.

​
πŸ’Ž
Extracting Deeper Meaning

Using the RDT Premium Add-Ons, extract deeper concepts that are embedded inside your data. This is particularly useful for complex data types that have a rich, real-world meaning.

Use Cases

We first created RDTs with the goal of generating synthetic data. The RDT library transforms the raw data for machine learning, and then reverse transforms machine-generated data to match the original. Synthetic data remains a top use case for RDT today.
If you'd like to use RDT for synthetic data, we recommend installing the sdv library. It will automatically download RDT, along with other libraries to support synthetic data generation & evaluation.
We open sourced the RDT library because the transformers are useful beyond the synthetic data space. You can use RDT to:
  • Preprocess your data for data science and analytics projects
  • Sanitize datasets before publishing them broadly for research
  • Translate machine output to human readable data

Owned & Maintained by DataCebo

The RDT library is a part of the Synthetic Data Vault Project, first created at MIT's Data to AI Lab in 2016. After 4 years of research and traction with enterprise, we created DataCebo in 2020 with the goal of growing the project.
Today, DataCebo is the proud developer of the SDV, the largest ecosystem for synthetic data generation & evaluation.