Basic Synthesizers

The SDGym library includes some basic synthesizers that you can use for benchmarking purposes. Pass the string names into the synthesizers parameter.

import sdgym

sdgym.benchmark_single_table(
    synthesizers=['DataIdentity', 'UniformSynthesizer']
)

Use basic synthesizers for comparison purposes only! The basic synthesizers listed below are likely not great candidates for creating usable synthetic data. Use them as comparisons with other synthesizers, such as SDV Synthesizers.

Basic SynthesizerDescription

DataIdentity

This synthesizer* returns the same data that it receives. It serves as an identity function. *Technically, this technique doesn't really count as a synthesizer as it does not create new data

UniformSynthesizer

This synthesizer learns the numerical ranges or categories of each column. Then, it creates synthetic data by randomly generating values within the boundaries.

IndependentSynthesizer

This synthesizer learns the marginal distributions of each column independently to generate synthetic data. For numerical columns, it learns a Gaussian Mixture. For categorical columns, it learns the frequencies of each category. This synthesizer does not learn any correlations between the different columns.

FAQs

What if I have an idea for another basic synthesizer?

If there are other basic techniques you'd like to see included in the SDGym library, please create a Feature Request with your ideas.

In the meantime, you can create a Custom Synthesizer where you can implement the techniques.

Last updated

© Copyright 2023, DataCebo, Inc.