Fast ML Preset

This synthesizer is deprecated. Please use the GaussianCopulaSynthesizer instead. The Gaussian Copula is just as fast as the Fast ML Preset, with more customization options for higher quality data.

The Fast ML Preset synthesizer is optimized for modeling speed. This is a great choice for first time SDV users. Use it to quickly get started with synthetic data.

from sdv.lite import SingleTablePreset

synthesizer = SingleTablePreset(metadata, name='FAST_ML')
synthesizer.fit(data)

synthetic_data = synthesizer.sample(num_rows=10)

Creating a synthesizer

When creating your synthesizer, you are required to pass in a Metadata object as the first argument and the 'FAST_ML' preset name as the second. All other parameters are optional. You can include them to customize the synthesizer.

synthesizer = SingleTablePreset(
    metadata, # required
    name='FAST_ML', # required
    locales=['en_US', 'en_CA']
)

Parameter Reference

locales: A list of locale strings. Any PII columns will correspond to the locales that you provide.

(default) ['en_US']

Generate PII values in English corresponding to US-based concepts (eg. addresses, phone numbers, etc.)

<list>

Create data from the list of locales. Each locale string consists of a 2-character code for the language and 2-character code for the country, separated by an underscore.

For example ["en_US", "fr_CA"].

For all options, see the Faker docs.

Learning from your data

To learn a machine learning model based on your real data, use the fit method.

fit

Parameters

(required) data: A pandas DataFrame object containing the real data that the machine learning model will learn from

Output (None)

synthesizer.fit(data)

Technical Details: This preset uses the GaussianCopulaSynthesizer with fixed settings.

GaussianCopulaSynthesizer(
    enforce_min_max_values=True,
    enforce_rounding=True,
    default_distribution='norm',
)

This allows for a fast modeling time while still using machine learning to learn patterns. For more details about which patterns are learned, see the GitHub Discussion.

Saving your synthesizer

Save your trained synthesizer for future use.

save

Use this function to save your trained synthesizer as a Python pickle file.

Parameters

(required) filepath: A string describing the filepath where you want to save your synthesizer. Make sure this ends in .pkl

Output (None) The file will be saved at the desired location

synthesizer.save(
    filepath='my_synthesizer.pkl'
)

SingleTablePreset.load

Use this function to load a trained synthesizer from a Python pickle file

Parameters

(required) filepath: A string describing the filepath of your saved synthesizer

Output Your synthesizer, as a SingleTablePreset object

from sdv.lite import SingleTablePreset

synthesizer = SingleTablePreset.load(
    filepath='my_synthesizer.pkl'
)

What's next?

After training your synthesizer, you can now sample synthetic data. See the Sampling section for more details.

Last updated 10 months ago