Sample Realistic Data
Last updated
Last updated
Create realistic synthetic data data that follows the same format and mathematical properties as the real data.
Use this function to create synthetic data that follows the same format and mathematical properties as the real data.
Parameters
(required) num_rows
: An integer >0 that specifies the number of rows to synthesize
batch_size
: An integer >0, describing the number of rows to sample at a time. If you are sampling a large number of rows, setting a smaller batch size allows you to see and save incremental progress. Defaults to the same as num_rows
.
max_tries_per_batch
: An integer >0, describing the number of sampling attempts to make per batch. If you have included constraints, it may take multiple batches to create valid data. Defaults to 100
.
output_file_path
: A string describing a CSV filepath for writing the synthetic data. Specify to None
to skip writing to a file. Defaults to None
.
Returns A object with synthetic data. The synthetic data mimics the real data.
Use this function to reset any randomization in sampling. After calling this, your synthesizer will generate the same data as before. For example in the code below, synthetic_data1
and synthetic_data2
are the same.
Parameters None
Returns None. Resets the synthesizer.