Output (None) This function prints out the status and detected config. The config describes the sdtypes of each column and the transformer objects that will be used for each. For more details, see the Basic Concepts guide.
Examples
ht.detect_initial_config(data=customers)
Detecting a new config from the data ...SUCCESSSetting the new config ...SUCCESSConfig:{'sdtypes':{'last_login':'datetime','email_optin':'boolean','credit_card':'categorical','age':'numerical','dollars_spent':'numerical'},'transformers':{'last_login':UnixTimestampEncoder(missing_value_replacement="mean"),'email_optin':UniformEncoder(),'credit_card':UniformEncoder(),'age':FloatFormatter(),'dollars_spent':FloatFormatter(missing_value_replacement="mean")}}
set_config()
This method sets the config. Use this as an alternative to detect_initial_config if you want to write and set the config manually.
Parameters
(required) config: A nested dictionary that describes the config. It must follow the format shown below.
The public RDT supports the following sdtypes:'categorical', 'datetime', 'numerical', 'pii' and 'id'
You can use any transformer object from the RDT (or specify None if you do not want to transform the column). Visit the Transformers Glossary to browse through the available transformers and their settings.
Customize your HyperTransformer by modifying the config.
update_sdtypes()
This method modifies the sdtypes. It also automatically assigns a new transformer that's compatible with the new sdtype.
Parameters
(required) column_name_to_sdtype: A dictionary that maps a column name to its new sdtype. The public RDT supports 'boolean', 'categorical', 'datetime', 'numerical', 'pii' and 'id' sdtypes. More are available for licensed users.
Output (None) After using this method, you can use get_config() to verify the changes.
Examples
update_transformers()
This method updates the transformers that will be used on specific columns. Use it to customize your HyperTransformer, for example by changing a transformer setting or swapping out one transformer for another.
Parameters
(required) column_name_to_transformer: A dictionary that maps a column name to the new transformer that will be used on it.
You can use any transformer object from the RDT. Visit the Transformers Glossary to browse through the available transformers and their settings.
Output (None) After using this method, you can use get_config() to verify the changes.
Examples
To update transformers, you must first create the transformers you want to use and then apply the method.
remove_transformers()
This method removes transformers for specific columns. Use this is if you do not want the HyperTransformer to modify certain columns at all. It will skip over the column names and modify the remaining columns that do have transformers.
Parameters
(required) column_names: A list of column names. The transformers for these column names are removed.
Output (None) After using this method, you can use get_config() to verify the changes.
Examples
update_transformers_by_sdtype()
This method updates all columns of a given sdtype to using a specific transformer.
Parameters
(required) sdtype: An sdtype. This method will select all columns that match the sdtype.
(required) transformer_name: A string with the name of the transformer to use.
transformer_parameters: A dictionary that maps the name of the transformer parameter (string) to the parameter value. Use this if you want to override the default settings.
Visit the Transformers Glossary to browse through the available transformers and their settings.
Output (None) After using this method, you can use get_config() to verify the changes.
Examples
remove_transformers_by_sdtype()
This method removes transformers for all columns of a given sdtype. Use this method if you do not want to transform any columns of a particular sdtype.
Parameters
(required) sdtype: An sdtype. This method will remove the transformer for all columns that match the given sdtype.
Output (None) After using this method, you can use get_config() to verify the changes.
from rdt.transformers.datetime import OptimizedTimestampEncoder
from rdt.transformers.categorical import LabelEncoder
# create new transformer objects
login_transformer = OptimizedTimestampEncoder(missing_value_replacement='random')
credit_transformer = LabelEncoder(add_noise=True)
# update the columns to use our the new transformers
ht.update_transformers(column_name_to_transformer={
'last_login': login_transformer,
'credit_card': credit_transformer
})
# do not transform the credit_card or age columns
ht.remove_transformers(column_names=['credit_card', 'age'])
# update all numerical columns to use a specific transforemr
ht.update_transformers_by_sdtype(
sdtype='numerical',
transformer_name='FloatFormatter',
transformer_parameters={'missing_value_generation': 'from_column',
'enforce_min_max_values': True}
)
# do not transform any categorical columns in the dataset
ht.remove_transformers_by_sdtype(sdtype='categorical')