RDT: Reversible Data Transforms
How much effort are you spending in cleaning and processing your data?
RDT (Reversible Data Transforms) is a Python library that translates between real world data and cleaned, numerical data that's ready for data science.
Cleaning and formatting raw data is a foundational element of RDT. But you can use the library to do much more.
Normalize your data using statistical processes. This is especially useful for data science and machine learning projects.
Protect sensitive data while preserving the overall data format. Using RDTs, you can remove and anonymize Personal Identifiable Information. Use it to generate random, fake values that look like the original ones.
Licensed users can extract deeper concepts that are embedded inside the data. This is particularly useful for complex data types that have a rich, real-world meaning.
We first created RDTs with the goal of generating synthetic data. The RDT library transforms the raw data for machine learning, and then reverse transforms machine-generated data to match the original. Synthetic data remains a top use case for RDT today.
If you'd like to use RDT for synthetic data, we recommend installing the sdv library. It will automatically download RDT, along with other libraries to support synthetic data generation & evaluation.
We open sourced the RDT library because the transformers are useful beyond the synthetic data space. You can use RDT to:
- Preprocess your data for data science and analytics projects
- Sanitize datasets before publishing them broadly for research
- Translate machine output to human readable data
The RDT library is a part of the Synthetic Data Vault Project, first created at MIT's Data to AI Lab in 2016. After 4 years of research and traction with enterprise, we created DataCebo in 2020 with the goal of growing the project.
Today, DataCebo is the proud developer of the SDV, the largest ecosystem for synthetic data generation & evaluation.