SDV Synthesizers

The SDV library offers a variety of synthesizers that you can use for creating synthetic data and benchmarking it. Pass the string names into the synthesizers parameter.

import sdgym

sdgym.benchmark_single_table(
    synthesizers=['GaussianCopulaSynthesizer', 'FastMLPreset']
)

The table below contains a full list of SDV Synthesizers.

SDV SynthesizerDescriptionLink

FASTMLPreset

This synthesizer is customized for fast modeling and sampling time while using machine learning

GaussianCopulaSynthesizer

This synthesizer uses classical statistical methods to model the data

CTGANSynthesizer

This synthesizer uses a GAN to model the data

TVAESynthesizer

This synthesizer uses a variational auto encode to model the data

[Experimental!] CopulaGANSynthesizer

This synthesizer combines classical statistical methods and GANs to model the data

The SDGym library uses the default settings of each synthesizer. To change them, create a variant using the steps below.

Create an SDV variant

Many of the SDV synthesizers can be tuned by setting different parameters. You can test these parameters by creating a variant of the synthesizer.

create_sdv_synthesizer_variant

Use this method to create a variant of an SDV synthesizer.

from sdgym import create_sdv_synthesizer_variant

GammaCopulaSynthesizer = create_sdv_synthesizer_variant(
  synthesizer_class='GaussianCopulaSynthesizer',
  synthesizer_parameters={ 'default_distribution': 'gamma' }
  display_name='GammaCopulaSynthesizer'
)

Parameters

  • (required) synthesizer_class: A string with the name of the synthesizer. This must be one of the predefined synthesizers: 'GaussianCopulaSynthesizer', 'CTGANSynthesizer', 'FastMLPreset', 'TVAESynthesizer', 'CopulaGANSynthesizer'

  • (required) synthesizer_parameters: A dictionary mapping the name of each parameter to the value that you'd like to set for it. The parameters and values may be different for each synthesizer. For more information, see the SDV API.

  • (required) display_name: A string that identifies this variant. The display name will appear in the benchmarking results.

Returns A synthesizer class that you can use directly in the benchmarking script

Using your synthesizer variant

To use your synthesizer variant for benchmarking, provide the class using the custom_synthesizers parameter. For example:

import sdgym

sdgym.benchmark_single_table(
    custom_synthesizers=[GammaCopulaSynthesizer]
)

Results

Results from your synthesizer variant will be labeled by the provided display_name.

Synthesizer                      Dataset   Dataset_Size_MB   Model_Time   Peak_Memory_KB   Model_Size_MB    Sample_Time    Evaluate_Time   Quality Score   NewRowSynthesis
Variant:GammaCopulaSynthesizer   alarm     34.5              45.45        100201           0.340            2012.2         1001.2          0.71882         0.99901           

See Interpreting Results for more details.

Last updated

© Copyright 2023, DataCebo, Inc.