Custom Synthesizers
Last updated
Last updated
The SDGym allows you to benchmark your custom synthesizer. Follow this guide to write your synthesizer in the correct format.
Your synthesizer should work by training a machine learning model using the real data. Then, it should sample synthetic data using the model. You will need to provide both the training and sampling logic using the guidelines below.
Write a function that trains a model using the real data and any information present in the . It outputs a fully trained synthesizer, represented as any kind of object.
Parameters
(required) data
: A with the real data
(required) metadata
: A dictionary that provides information about the column types in the real data
Output Any object that represents your fully trained synthesizer
Write a function that accepts the trained synthesizer (from the previous step) and uses it to generate synthetic data of a specified length.
Parameters
(required) synthesizer
: The synthesizer object from the previous step
(required) n_rows
: An integer >0 that represents the number of synthetic data rows to create
Once you've defined your logic, put it all together to create an SDGym synthesizer. Use the create_single_table_synthesizer
function.
Parameters
(required) get_trained_synthesizer_fn
: A function that creates a trained synthesizer
(required) sample_from_synthesizer_fn
: A function that creates synthetic data
(required) display_name
: A string representing the name of the synthesizer. This display name will be used to identify your custom synthesizer in the benchmarking results.
Output A class object that represents your custom SDGym synthesizer.
Once you've created your custom SDGym synthesizer, use it in a benchmarking run by providing the custom_synthesizers
parameter. Pass in the classes directly.
Results from your custom synthesizer will be labeled by the provided display_name
.
Output A object with the synthetic data. It should contain the specified number of rows.
See for more details.