Custom Synthesizers
The SDGym allows you to benchmark your custom synthesizer. Follow this guide to write your synthesizer in the correct format.
Creating your synthesizer
Your synthesizer should work by training a machine learning model using the real data. Then, it should sample synthetic data using the model. You will need to provide both the training and sampling logic using the guidelines below.
Step 1: Training
Write a function that trains a model using the real data and any information present in the Metadata. It outputs a fully trained synthesizer, represented as any kind of object.
Parameters
(required)
data
: A pandas.DataFrame with the real data(required)
metadata
: A Metadata dictionary that provides information about the column types in the real data
Output Any object that represents your fully trained synthesizer
def get_trained_synthesizer(data, metadata):
# create an object to represent your synthesizer
# train it using the data and metadata
return synthesizer
Step 2: Sampling
Write a function that accepts the trained synthesizer (from the previous step) and uses it to generate synthetic data of a specified length.
Parameters
(required)
synthesizer
: The synthesizer object from the previous step(required)
n_rows
: An integer >0 that represents the number of synthetic data rows to create
Output A pandas.DataFrame object with the synthetic data. It should contain the specified number of rows.
def sample_from_synthesizer(synthesizer, n_rows):
# use the trained synthesizer object to sample
# n_rows of synthetic data
return synthetic_data
Step 3: Creating your synthesizer
Once you've defined your logic, put it all together to create an SDGym synthesizer. Use the create_single_table_synthesizer
function.
Parameters
(required)
get_trained_synthesizer_fn
: A function that creates a trained synthesizer(required)
sample_from_synthesizer_fn
: A function that creates synthetic data(required)
display_name
: A string representing the name of the synthesizer. This display name will be used to identify your custom synthesizer in the benchmarking results.
Output A class object that represents your custom SDGym synthesizer.
from sdgym import create_single_table_synthesizer
MyCustomSynthesizerClass = create_single_table_synthesizer(
get_trained_synthesizer_fn=get_trained_synthesizer,
sample_from_synthesizer_fn=sample_from_synthesizer,
display_name='MyCustomSynthesizer'
)
Using your synthesizer
Once you've created your custom SDGym synthesizer, use it in a benchmarking run by providing the custom_synthesizers
parameter. Pass in the classes directly.
import sdgym
sdgym.benchmark_single_table(
custom_synthesizers=[MyCustomSynthesizerClass]
)
Results
Results from your custom synthesizer will be labeled by the provided display_name
.
Synthesizer Dataset Dataset_Size_MB Model_Time Peak_Memory_KB Model_Size_MB Sample_Time Evaluate_Time Quality Score NewRowSynthesis
Custom:MyCustomSynthesizer alarm 34.5 45.45 100201 0.340 2012.2 1001.2 0.71882 0.99901
See Interpreting Results for more details.
Last updated