The RDT library is a collection of objects that can understand your raw data convert it into cleaned, numerical data.
Transformers are the basic building blocks. They are designed to modify a single column of your dataset.
Transformers are designed to work on specific types of data using different techniques. You can determine which strategies to use for your data, including handling missing values.
The HyperTransformer manages all the transformers you need for an entire, multi-column dataset. You can mix and match your favorite transformers on different columns of your data.
You can also reverse the process to recover the original data format.
The RDT library uses sdtypes to keep track of what the data represents. You can think of an sdtype as representing the semantic (or statistical) meaning of a datatype.
Valid sdtypes in the open source RDT:
'text'. More are available as Premium Add-Ons.
The config describes the plan for transforming all the columns in a dataset. It describes the columns in your dataset and the transformers that will be applied to each one.