PARSynthesizeruses a deep learning methods to train a model and generate synthetic data.
from sdv.sequential import PARSynthesizer
synthesizer = PARSynthesizer(metadata)
synthetic_data = synthesizer.sample(num_sequences=100)
Is the PARSynthesizer suited for your dataset? The PARSynthesizer is designed to work on multi-sequence data, which means that there are multiple sequences (usually belonging to different entities) present within the same dataset. This means that your metadata should include a
sequence_key. Using this information, the PARSynthesizer creates brand new entities and brand new sequences for each one.
If your dataset contains only a single sequence of data, then the PARSynthesizer is not suited for your dataset.
synthesizer = PARSynthesizer(
metadata, # required
enforce_min_max_values: Control whether the synthetic data should adhere to the same min/max boundaries set by the real data
enforce_rounding: Control whether the synthetic data should have the same number of decimal digits as the real data
locales: A list of locale strings. Any PII columns will correspond to the locales that you provide.
context_columns: Provide a list of strings that represent the names of the context columns. Context columns do not vary inside of a sequence. For example, a user's
'Address'may not vary within a sequence while other columns such as
'Heart Rate'would. Defaults to an empty list.
epochs: Number of times to train the GAN. Each new epoch can improve the model.
verbose: Control whether to print out the results of each epoch. You can use this to track the training time as well as the improvements per epoch.
Looking for more customizations? Other settings are available to fine-tune the architecture of the underlying neural network used to model the data. Click the section below to expand.
These settings are specific to the neural network. Use these settings if you want to optimize the technical architecture and modeling.
sample_size: The number of times to sample (before choosing and returning the sample which maximizes the likelihood). Defaults to
segment_size: Cut each training sequence into several segments by using this parameter. For example, if the
segment_size=10then each segment contains 10 data points. Defaults to
None, which means the sequences are not cut into any segments.
Use this function to access the custom parameters you have included for the synthesizer
Output A dictionary with the parameter names and the values
'context_columns': ['Address', 'Smoker']
The returned parameters are a copy. Changing them will not affect the synthesizer.
Use this function to access the metadata object that you have included for the synthesizer
metadata = synthesizer.get_metadata()
The returned metadata is a copy. Changing it will not affect the synthesizer.
To learn a machine learning model based on your real data, use the
Technical Details: PAR is a Probabilistic Auto-Regressive model that is based in neural networks. It learns how to create brand new sequences of multi-dimensional data, by conditioning on the unchanging, context values.
Save your trained synthesizer for future use.
Use this function to save your trained synthesizer as a Python pickle file.
filepath: A string describing the filepath where you want to save your synthesizer. Make sure this ends in
Output (None) The file will be saved at the desired location
Use this function to load a trained synthesizer from a Python pickle file
filepath: A string describing the filepath of your saved synthesizer
Output Your synthesizer, as a PARSynthesizer object
from sdv.single_table import PARSynthesizer
synthesizer = PARSynthesizer.load(
This synthesizer models non-numerical columns, including columns with missing values.
Although the Gaussian Copula algorithm is designed for only numerical data, this synthesizer converts other data types using Reversible Data Transforms (RDTs). To access and modify the transformations, see Advanced Features.
Yes, even if you're previously fit data, you should be able to call the
If you do this, the synthesizer will start over from scratch and fit the new data that you provide it. This is the equivalent of creating a new synthesizer and fitting it with new data.