Synthetic Data Vault
GitHubSlackDataCebo
  • Welcome to the SDV!
  • Tutorials
  • Explore SDV
    • SDV Community
    • SDV Enterprise
      • ⭐Compare Features
    • SDV Bundles
      • ❖ AI Connectors
      • ❖ CAG
      • ❖ Differential Privacy
      • ❖ XSynthesizers
  • Single Table Data
    • Data Preparation
      • Loading Data
      • Creating Metadata
    • Modeling
      • Synthesizers
        • GaussianCopulaSynthesizer
        • CTGANSynthesizer
        • TVAESynthesizer
        • ❖ XGCSynthesizer
        • ❖ SegmentSynthesizer
        • * DayZSynthesizer
        • ❖ DPGCSynthesizer
        • ❖ DPGCFlexSynthesizer
        • CopulaGANSynthesizer
      • Customizations
        • Constraints
        • Preprocessing
    • Sampling
      • Sample Realistic Data
      • Conditional Sampling
    • Evaluation
      • Diagnostic
      • Data Quality
      • Visualization
  • Multi Table Data
    • Data Preparation
      • Loading Data
        • Demo Data
        • CSV
        • Excel
        • ❖ AlloyDB
        • ❖ BigQuery
        • ❖ MSSQL
        • ❖ Oracle
        • ❖ Spanner
      • Cleaning Your Data
      • Creating Metadata
    • Modeling
      • Synthesizers
        • * DayZSynthesizer
        • * IndependentSynthesizer
        • HMASynthesizer
        • * HSASynthesizer
      • Customizations
        • Constraints
        • Preprocessing
      • * Performance Estimates
    • Sampling
    • Evaluation
      • Diagnostic
      • Data Quality
      • Visualization
  • Sequential Data
    • Data Preparation
      • Loading Data
      • Cleaning Your Data
      • Creating Metadata
    • Modeling
      • PARSynthesizer
      • Customizations
    • Sampling
      • Sample Realistic Data
      • Conditional Sampling
    • Evaluation
  • Concepts
    • Metadata
      • Sdtypes
      • Metadata API
      • Metadata JSON
    • Constraints
      • Predefined Constraints
        • Positive
        • Negative
        • ScalarInequality
        • ScalarRange
        • FixedIncrements
        • FixedCombinations
        • ❖ FixedNullCombinations
        • ❖ MixedScales
        • OneHotEncoding
        • Inequality
        • Range
        • * ChainedInequality
      • Custom Logic
        • Example: IfTrueThenZero
      • ❖ Constraint Augmented Generation (CAG)
        • ❖ CarryOverColumns
        • ❖ CompositeKey
        • ❖ ForeignToForeignKey
        • ❖ ForeignToPrimaryKeySubset
        • ❖ PrimaryToPrimaryKey
        • ❖ PrimaryToPrimaryKeySubset
        • ❖ SelfReferentialHierarchy
        • ❖ ReferenceTable
        • ❖ UniqueBridgeTable
  • Support
    • Troubleshooting
      • Help with Installation
      • Help with SDV
    • Versioning & Backwards Compatibility Policy
Powered by GitBook

Copyright (c) 2023, DataCebo, Inc.

On this page
  • Creating a synthesizer
  • Parameter Reference
  • set_table_parameters
  • get_parameters
  • get_table_parameters
  • get_metadata
  • Learning from your data
  • fit
  • get_learned_distributions
  • get_loss_values
  • Saving your synthesizer
  • save
  • IndependentSynthesizer.load
  • What's next?
  • FAQs
  1. Multi Table Data
  2. Modeling
  3. Synthesizers

* IndependentSynthesizer

Previous* DayZSynthesizerNextHMASynthesizer

Last updated 6 months ago

The Independent Synthesizer learns each table's patterns independently. This synthesizer offers fast performance for unlimited tables.

from sdv.multi_table import IndependentSynthesizer

synthesizer = IndependentSynthesizer(metadata)
synthesizer.fit(data)

synthetic_data = synthesizer.sample()

Creating a synthesizer

When creating your synthesizer, you are required to pass in a object as the first argument.

synthesizer = IndependentSynthesizer(metadata)

All other parameters are optional. You can include them to customize the synthesizer.

Parameter Reference

locales: A list of locale strings. Any PII columns will correspond to the locales that you provide.

(default) ['en_US']

Generate PII values in English corresponding to US-based concepts (eg. addresses, phone numbers, etc.)

<list>

Create data from the list of locales. Each locale string consists of a 2-character code for the language and 2-character code for the country, separated by an underscore.

synthesizer = IndependentSynthesizer(
    metadata,
    locales=['en_US', 'en_CA', 'fr_CA']
)

set_table_parameters

The Independent Synthesizer models each individual table. You can get and set the parameters for each table.

Parameters

  • (required) table_name: A string describing the name of the table

  • table_synthesizer: The single table synthesizer to use for modeling the table

Output (None)

synthesizer.set_table_parameters(
    table_name='guests',
    table_synthesizer='GaussianCopulaSynthesizer',
    table_parameters={
        'enforce_min_max_values': True,
        'default_distribution': 'truncnorm',
        'numerical_distributions': { 
            'checkin_date': 'uniform',
            'amenities_fee': 'beta' 
        }
    }
)

get_parameters

Use this function to access the all parameters your synthesizer uses -- those you have provided as well as the default ones.

Parameters (None)

Output A dictionary with the table names and parameters for each table.

These parameters are only for the multi-table synthesizer. To get individual table-level parameters, use the get_table_parameters function.

The returned parameters are a copy. Changing them will not affect the synthesizer.

synthesizer.get_parameters()
{
    'locales': ['en_US', 'fr_CA'],
    ...
}

get_table_parameters

Use this function to access the all parameters a table synthesizer uses -- those you have provided as well as the default ones.

Parameters

  • (required) table_name: A string describing the name of the table

Output A dictionary with the parameter names and the values

synthesizer.get_table_parameters(table_name='users')
{
    'synthesizer_name': 'GaussianCopulaSynthesizer',
    'synthesizer_parameters': {
        'default_distribution': 'beta',
        ...
    }
}

get_metadata

Use this function to access the metadata object that you have included for the synthesizer

Parameters None

metadata = synthesizer.get_metadata()

The returned metadata is a copy. Changing it will not affect the synthesizer.

Learning from your data

To learn a machine learning model based on your real data, use the fit method.

fit

Parameters

Output (None)

get_learned_distributions

After fitting this synthesizer, you can access the marginal distributions that were learned to estimate the shape of each column.

Parameters

  • (required) table_name: A string with the name of the table

Output A dictionary that maps the name of each learned column to the distribution that estimates its shape

synthesizer.get_learned_distributions(table_name='guests')
{
    'amenities_fee': {
        'distribution': 'beta',
        'learned_parameters': { 'a': 2.22, 'b': 3.17, 'loc': 0.07, 'scale': 48.5 }
    },
    'checkin_date': { 
        ...
    },
    ...
}

Learned parameters are only available for parametric models and distributions. For eg. you will not be able to access learned distributions for GAN-based synthesizers (such as CTGAN) or the 'gaussian_kde' technique.

In some cases, the synthesizer may not be able to fit the exact distribution shape you requested, so you may see another distribution shape (eg. 'truncnorm' instead of 'beta').

get_loss_values

After fitting, you can access the loss values computed during each epoch for both the numerator and denominator.

Parameters

  • (required) table_name: A string with the name of the table

Output A pandas.DataFrame object containing epoch number, generator loss value and discriminator loss value.

synthesizer.get_loss_values(table_name='users')
Epoch  Generator Loss  Discriminator Loss
1      1.7863          -0.3639
2      1.5484          0.2260
3      1.3633          -0.0441
...

Loss values are only available for tables that use neural network-based models. such as CTGAN, TVAE or CopulaGAN.

Saving your synthesizer

Save your trained synthesizer for future use.

save

Use this function to save your trained synthesizer as a Python pickle file.

Parameters

  • (required) filepath: A string describing the filepath where you want to save your synthesizer. Make sure this ends in .pkl

Output (None) The file will be saved at the desired location

synthesizer.save(
    filepath='my_synthesizer.pkl'
)

IndependentSynthesizer.load

Use this function to load a trained synthesizer from a Python pickle file

Parameters

  • (required) filepath: A string describing the filepath of your saved synthesizer

Output Your synthesizer, as a HMASynthesizer object

from sdv.multi_table import IndependentSynthesizer

synthesizer = IndependentSynthesizer.load(
    filepath='my_synthesizer.pkl'
)

What's next?

Want to improve your synthesizer? Input logical rules in the form of constraints, and customize the transformations used for pre- and post-processing the data.

FAQs

What happens if the columns don't contain numerical data?

This synthesizer models non-numerical columns, including columns with missing values.

For example [, ].

For all options, see the .

(default) 'GaussianCopulaSynthesizer': Use the to model the single table

Other available options: 'GaussianCopulaSynthesizer', 'CTGANSynthesizer', 'TVAESynthesizer', 'CopulaGANSynthesizer'. For more information, see .

table_parameters: A dictionary mapping the name of the parameter (string) to the value of the parameter (various). These parameters are different for each synthesizer. For more information, see .

Output A object

(required) data: A dictionary mapping each table name to a containing the real data that the machine learning model will learn from

Technical Details: The Independent Synthesizer models each table independently, as well as the cardinality between the tables (i.e. the number of children that each parent row has). It can use any to model the individual tables.

For more information about the distributions and their parameters, visit the.

After training your synthesizer, you can now sample synthetic data. See the section for more details.

For more details, see .

Some of the underlying machine learning algorithms are designed for only numerical data. This synthesizer processes the data using Reversible Data Transforms (RDTs). To access and modify the transformations, see .

GaussianCopulaSynthesizer
Single Table Synthesizers
Single Table Synthesizers
Metadata
pandas DataFrame
single table synthesizer
Copulas library
Sampling
Advanced Features
Advanced Features
"en_US"
"fr_CA"
Faker docs
Metadata

*SDV Enterprise Feature. This feature is only available for licensed, enterprise users. For more information, visit our page to Explore SDV.