Links

Sampling

Use these sampling methods to create synthetic data from your sequential model. You can use multiple functions to create synthetic data that is customized for your use case.

Create Realistic Data

Create realistic synthetic data data that follows the same format and mathematical properties as the real data.

sample

Use this function to create synthetic data that mimics the real data
synthetic_data = synthesizer.sample(
num_sequences=100,
sequence_length=None
)
Parameters
  • (required) num_sequences: An integer >0 describing the number of sequences to sample
  • sequence_length: An integer >0 describing the length of each sequence. If you provide None, the synthesizer will determine the lengths algorithmically, and the length may be different for each sequence. Defaults to None.
Returns A pandas DataFrame object with synthetic data. The synthetic data mimics the real data.

Reference Context Data

Synthesize data based on known, context columns as reference.

sample_sequential_columns

Use this function to sample the sequential columns based on known, context columns that do not change.
Parameters
  • (required) context_columns: A pandas DataFrame that contains the sequence key and all the context columns of your data that do not vary with respect to time. Each row corresponds to a sequence that you want to synthesize.
  • sequence_length: An integer >0 describing the length of each sequence. If you provide None, the synthesizer will determine the lengths algorithmically, and the length may be different for each sequence. Defaults to None.
Returns A pandas DataFrame object with synthetic data. The synthetic data is based on the referenced, context columns.

Controlling Randomization

Every time you use any of the sampling methods, the synthetic data will be different than the previous runs.

reset_sampling

Use this function to reset the randomization. After calling this, any sampling method generates the same data as before. For example in the code below, synthetic_data1 and synthetic_data2 are the same.
synthesizer.reset_sampling()
synthetic_data1 = synthesizer.sample(num_rows=10)
synthesizer.reset_sampling()
synthetic_data2 = synthesizer.sample(num_rows=10)
Parameters None
Returns None. Resets the synthesizer.
Copyright (c) 2023, DataCebo, Inc.