Custom Synthesizers
The SDGym allows you to benchmark your custom synthesizer. Follow this guide to write your synthesizer in the correct format.
Your synthesizer should work on all single table datasets. The SDGym does not currently support sequential or multi-table data.
Creating your synthesizer
Your synthesizer should work by training a machine learning model using the real data. Then, it should sample synthetic data using the model. You will need to provide both the training and sampling logic using the guidelines below.
Step 1: Training
Write a function that trains a model using the real data and any information present in the Metadata. It outputs a fully trained synthesizer, represented as any kind of object.
Parameters
(required)
data
: A pandas.DataFrame with the real data(required)
metadata
: A Metadata dictionary that provides information about the column types in the real data
Output Any object that represents your fully trained synthesizer
Step 2: Sampling
Write a function that accepts the trained synthesizer (from the previous step) and uses it to generate synthetic data of a specified length.
Parameters
(required)
synthesizer
: The synthesizer object from the previous step(required)
n_rows
: An integer >0 that represents the number of synthetic data rows to create
Output A pandas.DataFrame object with the synthetic data. It should contain the specified number of rows.
Step 3: Creating your synthesizer
Once you've defined your logic, put it all together to create an SDGym synthesizer. Use the create_single_table_synthesizer
function.
Parameters
(required)
get_trained_synthesizer_fn
: A function that creates a trained synthesizer(required)
sample_from_synthesizer_fn
: A function that creates synthetic data(required)
display_name
: A string representing the name of the synthesizer. This display name will be used to identify your custom synthesizer in the benchmarking results.
Output A class object that represents your custom SDGym synthesizer.
Using your synthesizer
Once you've created your custom SDGym synthesizer, use it in a benchmarking run by providing the custom_synthesizers
parameter. Pass in the classes directly.
Results
Results from your custom synthesizer will be labeled by the provided display_name
.
See Interpreting Results for more details.
Last updated