❖ BootstrapSynthesizer
The BootstrapSynthesizer is a synthesizer specifically designed to work when you only have a few rows of data — or if your data is "short and wide", containing more columns than rows. This synthesizer internally bootstraps your real data, and then uses the bootstrapped data to build a model. The modeling part is compatible with any other single-table synthesizer.
from sdv.single_table import BootstrapSynthesizer
synthesizer = BootstrapSynthesizer(metadata)
synthesizer.fit(data)
synthetic_data = synthesizer.sample(num_rows=10)
Creating a synthesizer
When creating your synthesizer, you are required to pass in a Metadata object as the first argument. All other parameters are optional. You can include them to customize the synthesizer.
synthesizer = BootstrapSynthesizer(
metadata, # required
num_rows_bootstrap=1000,
bootstrap_noise_amt=1.5,
data_synthesizer='GaussianCopulaSynthesizer',
enforce_min_max_values=True,
synthesize_missing_values=False
)
Parameter Reference
num_rows_bootstrap
: Specify the number of additional rows to bootstrap before modeling the data.
(default) 1000
Bootstrap the original data by creating 1000 rows of additional data
<integer>
Create the desired number of bootstrapped rows before building the model
bootstrap_noise_amount
: The amount of noise to add when bootstrapping the data. Some noise is necessary to provide a greater diversity of data points for modeling.
(default) 1.5
When bootstrapping the data, add noise that is equal to 1.5x the standard deviation of each row.
<float>
Add the desired amount of noise to the bootstrapped data. This is the multiplier to the standard deviation, so 1.5 means 1.5x the standard deviation, 2 means 2x the standard deviation, etc.
data_synthesizer
: The single-table synthesizer to use when modeling the bootstrapped data.
(default) 'GaussianCopulaSynthesizer'
Use the GaussianCopulaSynthesizer to build a model of the bootstrapped data
<synthesizer_name>
Supply a synthesizer name from the list of single table synthesizers. For example 'XGCSynthesizer'
or 'CTGANSynthesizer'
.
data_synthesizer_params
: A dictionary of parameters to use for the synthesizer
(default) None
Use the default parameters for the synthesizer
<dictionary>
Update the default parameters for the synthesizer you've chosen by providing a dictionary of key/values pairs for each parameter. Refer to the docs for your synthesizer for possible parameters. For example, for GaussianCopulaSynthesizer you can supply: {'default_distribution': 'norm'}
.
enforce_min_max_values
: Control whether the synthetic data should adhere to the same min/max boundaries set by the real data
(default) True
The synthetic data will contain numerical values that are within the ranges of the real data.
False
The synthetic data may contain numerical values that are less than or greater than the real data.
synthesize_missing_values
: Control whether the synthetic data should include missing values.
(default) True
The synthetic data will contain missing values in roughly the same proportion as the original data
False
The synthetic data may should not contain any missing values for numerical and datetime columns.
get_parameters
Use this function to access the all parameters your synthesizer uses -- those you have provided as well as the default ones.
Parameters None
Output A dictionary with the parameter names and the values
synthesizer.get_parameters()
{
'num_rows_bootstrap': 1000,
'bootstrap_noise_amt': 1.5,
'data_synthesizer': 'GaussianCopulaSynthesizer',
'enforce_min_max_bounds': True,
'synthesize_missing_values': False
}
get_metadata
Use this function to access the metadata object that you have included for the synthesizer
Parameters None
Output A Metadata object
metadata = synthesizer.get_metadata()
Learning from your data
To learn a machine learning model based on your real data, use the fit
method.
fit
Parameters
(required)
data
: A pandas DataFrame object containing the real data that the machine learning model will learn from
Output (None)
synthesizer.fit(data)
Saving your synthesizer
Save your trained synthesizer for future use.
save
Use this function to save your trained synthesizer as a Python pickle file.
Parameters
(required)
filepath
: A string describing the filepath where you want to save your synthesizer. Make sure this ends in.pkl
Output (None) The file will be saved at the desired location
synthesizer.save(
filepath='my_synthesizer.pkl'
)
BootstrapSynthesizer.load
Use this function to load a trained synthesizer from a Python pickle file
Parameters
(required)
filepath
: A string describing the filepath of your saved synthesizer
Output Your synthesizer, as an XGCSynthesizer object
from sdv.single_table import BootstrapSynthesizer
synthesizer = BootstrapSynthesizer.load(
filepath='my_synthesizer.pkl'
)
What's next?
After training your synthesizer, you can now sample synthetic data. See the Sampling section for more details.
FAQ
Last updated