Basic Synthesizers
The SDGym library includes some basic synthesizers that you can use for benchmarking purposes. Pass the string names into the synthesizers parameter.
import sdgym
sdgym.benchmark_single_table(
synthesizers=['DataIdentity', 'UniformSynthesizer']
)Use basic synthesizers for comparison purposes only! The basic synthesizers listed below are likely not great candidates for creating usable synthetic data. Use them as comparisons with other synthesizers, such as SDV Synthesizers.
Basic Single-Table Synthesizers
DataIdentity
This synthesizer* returns the same data that it receives. It serves as an identity function. *Technically, this technique doesn't really count as a synthesizer as it does not create new data
UniformSynthesizer
This synthesizer learns the numerical ranges or categories of each column. Then, it creates synthetic data by randomly generating values within the boundaries.
ColumnSynthesizer
This synthesizer learns the marginal distributions of each column independently to generate synthetic data. For numerical columns, it learns a Gaussian Mixture. For categorical columns, it learns the frequencies of each category. This synthesizer does not learn any correlations between the different columns.
Basic Multi-Table Synthesizers
MultiTableUniformSynthesizer
This synthesizer learns the numerical ranges or categories of each column. Then, it creates synthetic data by randomly generating values within the boundaries. This synthesizer also randomly creates ID columns (primary and foreign key columns). It does not ensure that the connections or valid, or that referential integrity is held.
FAQs
What if I have an idea for another basic synthesizer?
If there are other basic techniques you'd like to see included in the SDGym library, please create a Feature Request with your ideas.
In the meantime, you can create a Custom Synthesizer where you can implement the techniques.
Last updated