Links

Sampling

Use these sampling methods to create synthetic data from your multi table model.

Create Realistic Data

Create realistic synthetic data data that follows the same format and mathematical properties as the real data.

sample

Use this function to create synthetic data that mimics the real data
synthetic_data = synthesizer.sample(
scale=1.5
)
Parameters
  • scale: A float >0.0 that describes how much to scale the data by
(default) 1
Don't scale the data. The model will create synthetic data that is roughly the same size as the original data.
>1
Scale the data by the specified factor. For example, 2.5 will create synthetic data that is roughly 2.5x the size of the original data.
<1
Shrink the data by the specified pecentage. For example, 0.9 will create synthetic data that is roughtly 90% of the size of the original data.
Returns A dictionary that maps each table name (string) to a pandas DataFrame object with synthetic data for that table. The synthetic data mimics the real data.
How large will the synthetic data be? The scale is based on the size of the data you used for training. The scale determines the size of every parent table (ie a table without any foreign keys).
Note that the synthesizer will algorithmically determine the size of the child tables, so their final sizes will approximately follow the scale, with some minor deviations.

Controlling Randomization

Every time you use any of the sampling methods, the synthetic data will be different than the previous runs.

reset_sampling

Use this function to reset the randomization. After calling this, any sampling method generates the same data as before. For example in the code below, synthetic_data1 and synthetic_data2 are the same.
synthesizer.reset_sampling()
synthetic_data1 = synthesizer.sample(scale=1.5)
synthesizer.reset_sampling()
synthetic_data2 = synthesizer.sample(scale=1.5)
Parameters None
Returns None. Resets the synthesizer.
Copyright (c) 2023, DataCebo, Inc.