Modeling

The SDV creates synthetic data using machine learning. A synthesizer is an object that you can use to accomplish this task.

  1. You'll start by creating a synthesizer based on your metadata

  2. Next, you'll train the synthesizer using real data. In this phase, the synthesizer will learn patterns from the real data.

  3. Once your synthesizer is trained, you can use it to generate new, synthetic data.

from sdv.single_table import GaussianCopulaSynthesizer

# Step 1: Create the synthesizer
synthesizer = GaussianCopulaSynthesizer(metadata)

# Step 2: Train the synthesizer
synthesizer.fit(real_data)

# Step 3: Generate synthetic data
synthetic_data = synthesizer.sample(num_rows=100)

What's next?

Choose from a variety of synthesizers. Each synthesizer uses a different machine learning technique for training.

Want to improve your synthetic data? You can control the pre- and post-processing steps in your synthesizer, and set up custom, anonymization controls. You can also enforce logical rules in the form of constraints. See the Advanced Features for more options.

Last updated

Copyright (c) 2023, DataCebo, Inc.