Basic Concepts

The RDT library is a collection of objects that can understand your raw data convert it into cleaned, numerical data.


Transformers are the basic building blocks. They are designed to modify a single column of your dataset.
Transformers are designed to work on specific types of data using different techniques. You can determine which strategies to use for your data, including handling missing values.
The Transformers Glossary contains a full list of available transformers and their settings.


The HyperTransformer manages all the transformers you need for an entire, multi-column dataset. You can mix and match your favorite transformers on different columns of your data.
You can also reverse the process to recover the original data format.
Read the HyperTransformer usage guide to learn more.


The RDT library uses sdtypes to keep track of what the data represents. You can think of an sdtype as representing the semantic (or statistical) meaning of a datatype.
Valid sdtypes in the open source RDT: 'boolean', 'categorical', 'datetime', 'numerical', 'pii' and 'text'. More are available as Premium Add-Ons.
Read the Sdtypes usage guide to learn more about the differences between each type.


The config describes the plan for transforming all the columns in a dataset. It describes the columns in your dataset and the transformers that will be applied to each one.
Read the Config guide to learn more.