Sampling
Use these sampling methods to create synthetic data from your multi table model.
Sample Realistic Data
Create realistic synthetic data data that follows the same format and mathematical properties as the real data.
sample
Use this function to create synthetic data that mimics the real data
Parameters
scale
: A float >0.0 that describes how much to scale the data by
(default) | Don't scale the data. The model will create synthetic data that is roughly the same size as the original data. |
| Scale the data by the specified factor. For example, |
| Shrink the data by the specified pecentage. For example, |
Returns A dictionary that maps each table name (string) to a pandas DataFrame object with synthetic data for that table. The synthetic data mimics the real data.
How large will the synthetic data be? The scale is based on the size of the data you used for training. The scale determines the size of every parent table (ie a table without any foreign keys).
Note that the synthesizer will algorithmically determine the size of the child tables, so their final sizes will approximately follow the scale, with some minor deviations.
reset_sampling
Use this function to reset any randomization in sampling. After calling this, your synthesizer will generate the same data as before. For example in the code below, synthetic_data1
and synthetic_data2
are the same.
Parameters None
Returns None. Resets the synthesizer.
Save Your Data
Save your synthetic data back into its original format.
save_csvs
Use this function to save your synthetic data locally into CSV files. Each table will be written to a separate CSV file.
Parameters
(required)
data
: A dictionary mapping each table name to a pandas DataFrame containing the synthetic data(required)
folder_name
: The name of the folder you'd like to write the synthetic data in. All CSVs files will be written in the folder.suffix
: An optional string suffix to add to each CSV file name(default) If there is no suffix: Each table will be saved as
<table_name>.csv
Supply any other string to add a suffix. If a suffix is provided, it'll be added before
.csv
, for eg. a suffix of'-synthetic'
will create files like'<table_name>-synthetic.csv'
.
to_csv_parameters
: A dictionary with additional parameters to pass in when saving CSVs. The keys are any of the parameters in pandas.to_csv and the values are the inputs
Returns None. All the tables will be written as CSVs inside the folder name you specified.
Last updated