RDT: Reversible Data Transforms
Last updated
Last updated
How much effort are you spending in cleaning and processing your data?
RDT (Reversible Data Transforms) is a that translates between real world data and cleaned, numerical data that's ready for data science.
Cleaning and formatting raw data is a foundational element of RDT. But you can use the library to do much more.
Normalize your data using statistical processes. This is especially useful for data science and machine learning projects.
Protect sensitive data while preserving the overall data format. Using RDTs, you can remove and anonymize Personal Identifiable Information. Use it to generate random, fake values that look like the original ones.
Licensed users can extract deeper concepts that are embedded inside the data. This is particularly useful for complex data types that have a rich, real-world meaning.
We created RDTs with the goal of generating synthetic data. The RDT library transforms the raw data for machine learning, and then reverse transforms machine-generated data to match the original. Synthetic data remains a top use case for RDT today.
If you'd like to use RDT for synthetic data, we recommend installing the . It will automatically download RDT, along with other libraries to support synthetic data generation & evaluation.
RDT can be useful beyond the synthetic data space. You can use RDT for statistical preprocessing, contextual anonymization, or adding differential privacy as standalone projects. For more information, see .
The RDT library is a part of the , first created at MIT's in 2016. After 4 years of research and traction with enterprise, we created DataCebo in 2020 with the goal of growing the project.
Today, is the proud developer of the SDV, the largest ecosystem for synthetic data generation & evaluation.