Configuration
In order for your HyperTransformer to work, you'll need to provide it a configuration that describes:
the columns in your dataset and
the transformers that should be applied to turn them into numerical data.
Creating the config
To create the config you can either allow the HyperTransformer to automatically detect it from your data or you can write it by hand.
detect_initial_config()
This method automatically detects the config from your data and sets it. It overrides any existing config you may have previously set or detected.
Parameters
(required)
data
: a pandas DataFrame containing your data.
Output (None) This function prints out the status and detected config. The config describes the sdtypes of each column and the transformer objects that will be used for each. For more details, see the Basic Concepts guide.
Examples
set_config()
This method sets the config. Use this as an alternative to detect_initial_config
if you want to write and set the config manually.
Parameters
(required)
config
: A nested dictionary that describes the config. It must follow the format shown below.
The public RDT supports the following sdtypes:'categorical'
, 'datetime'
, 'numerical'
, 'pii'
and 'text'
You can use any transformer object from the RDT (or specify None
if you do not want to transform the column). Visit the Transformers Glossary to browse through the available transformers and their settings.
See the Config guide for more details.
Output (None)
Examples
You must provide the full config that describes all the columns in your dataset.
Viewing the config
get_config()
At any point, you can use this method to retrieve the current config.
Parameters (None)
Output A nested dictionary that describes the config. It follows the format shown below.
See the Config guide for more details.
Examples
Modifying the config
Customize your HyperTransformer by modifying the config.
update_sdtypes()
This method modifies the sdtypes. It also automatically assigns a new transformer that's compatible with the new sdtype.
Parameters
(required)
column_name_to_sdtype
: A dictionary that maps a column name to its new sdtype. The public RDT supports'boolean'
,'categorical'
,'datetime'
,'numerical'
,'pii'
and'text'
types. More are available for licensed users.
Output (None) After using this method, you can use get_config()
to verify the changes.
Examples
update_transformers()
This method updates the transformers that will be used on specific columns. Use it to customize your HyperTransformer, for example by changing a transformer setting or swapping out one transformer for another.
Parameters
(required)
column_name_to_transformer
: A dictionary that maps a column name to the new transformer that will be used on it.
You can use any transformer object from the RDT. Visit the Transformers Glossary to browse through the available transformers and their settings.
Output (None) After using this method, you can use get_config()
to verify the changes.
Examples
To update transformers, you must first create the transformers you want to use and then apply the method.
remove_transformers()
This method removes transformers for specific columns. Use this is if you do not want the HyperTransformer to modify certain columns at all. It will skip over the column names and modify the remaining columns that do have transformers.
Parameters
(required)
column_names
: A list of column names. The transformers for these column names are removed.
Output (None) After using this method, you can use get_config()
to verify the changes.
Examples
update_transformers_by_sdtype()
This method updates all columns of a given sdtype to using a specific transformer.
Parameters
(required)
sdtype
: An sdtype. This method will select all columns that match the sdtype.(required)
transformer_name
: A string with the name of the transformer to use.transformer_parameters
: A dictionary that maps the name of the transformer parameter (string) to the parameter value. Use this if you want to override the default settings.
Visit the Transformers Glossary to browse through the available transformers and their settings.
Output (None) After using this method, you can use get_config()
to verify the changes.
Examples
remove_transformers_by_sdtype()
This method removes transformers for all columns of a given sdtype. Use this method if you do not want to transform any columns of a particular sdtype.
Parameters
(required)
sdtype
: An sdtype. This method will remove the transformer for all columns that match the given sdtype.
Output (None) After using this method, you can use get_config()
to verify the changes.
Examples
Last updated