Synthetic Data Vault
GitHubSlackDataCebo
  • Welcome to the SDV!
  • Tutorials
  • Explore SDV
    • SDV Community
    • SDV Enterprise
      • ⭐Compare Features
    • SDV Bundles
      • ❖ AI Connectors
      • ❖ CAG
      • ❖ Differential Privacy
      • ❖ XSynthesizers
  • Single Table Data
    • Data Preparation
      • Loading Data
      • Creating Metadata
    • Modeling
      • Synthesizers
        • GaussianCopulaSynthesizer
        • CTGANSynthesizer
        • TVAESynthesizer
        • ❖ XGCSynthesizer
        • ❖ BootstrapSynthesizer
        • ❖ SegmentSynthesizer
        • * DayZSynthesizer
        • ❖ DPGCSynthesizer
        • ❖ DPGCFlexSynthesizer
        • CopulaGANSynthesizer
      • Customizations
        • Constraints
        • Preprocessing
    • Sampling
      • Sample Realistic Data
      • Conditional Sampling
    • Evaluation
      • Diagnostic
      • Data Quality
      • Visualization
      • Privacy
        • Empirical Differential Privacy
        • SDMetrics: Privacy Metrics
  • Multi Table Data
    • Data Preparation
      • Loading Data
        • Demo Data
        • CSV
        • Excel
        • ❖ AlloyDB
        • ❖ BigQuery
        • ❖ MSSQL
        • ❖ Oracle
        • ❖ Spanner
      • Cleaning Your Data
      • Creating Metadata
    • Modeling
      • Synthesizers
        • * DayZSynthesizer
        • * IndependentSynthesizer
        • HMASynthesizer
        • * HSASynthesizer
      • Customizations
        • Constraints
        • Preprocessing
      • * Performance Estimates
    • Sampling
    • Evaluation
      • Diagnostic
      • Data Quality
      • Visualization
  • Sequential Data
    • Data Preparation
      • Loading Data
      • Cleaning Your Data
      • Creating Metadata
    • Modeling
      • PARSynthesizer
      • Customizations
    • Sampling
      • Sample Realistic Data
      • Conditional Sampling
    • Evaluation
  • Concepts
    • Metadata
      • Sdtypes
      • Metadata API
      • Metadata JSON
    • Constraint-Augmented Generation (CAG)
      • Predefined Constraints
        • FixedCombinations
        • FixedIncrements
        • Inequality
        • OneHotEncoding
        • Range
        • ❖ CarryOverColumns
        • * ChainedInequality
        • ❖ CompositeKeys
        • ❖ FixedNullCombinations
        • ❖ ForeignToForeignKey
        • ❖ ForeignToPrimaryKeySubset
        • ❖ MixedScales
        • ❖ PrimaryToPrimaryKey
        • ❖ PrimaryToPrimaryKeySubset
        • ❖ ReferenceTable
        • ❖ SelfReferentialHierarchy
        • ❖ UniqueBridgeTable
      • Program Your Own Constraint
      • Constraints API
  • Support
    • Troubleshooting
      • Help with Installation
      • Help with SDV
    • Versioning & Backwards Compatibility Policy
Powered by GitBook

Copyright (c) 2023, DataCebo, Inc.

On this page
  • Basic Single Table Synthesizers
  • Specialty Synthesizers
  1. Single Table Data
  2. Modeling

Synthesizers

PreviousModelingNextGaussianCopulaSynthesizer

Last updated 15 days ago

The SDV offers a variety of synthesizers, which use different algorithms to generate synthetic data.

Basic Single Table Synthesizers

These synthesizers are available in the SDV Community package. They build a generative AI model using your real data, and use it to create synthetic data.

We recommend starting with GaussianCopulaSynthesizer for fast performance, good quality, and customization.

For higher fidelity, try a neural network-based synthesizer such as CTGANSynthesizer or TVAESynthesizer. Modeling and sampling performance may be slower for these synthesizers, especially if you have categorical columns with many different values (high cardinality).

Experimental synthesizer: The CopulaGANSynthesizer combines classical statistics with GAN-based modeling.

Specialty Synthesizers

Specialty synthesizers are available for special situations — such as improving speed, enhancing quality, or providing privacy guarantees.

Specialty synthesizers available for licensed, SDV Enterprise users (denoted by *) or through purchasing additional bundles (denoted by ❖). For more information, see SDV Enterprise and SDV Bundles.

GaussianCopulaSynthesizer

Use a classical ML algorithm to learn from real data. This is fast, transparent, and customizable.

CTGANSynthesizer

Use GAN-based ML algorithm to learn from real data. This may take longer to learn and be harder to debug.

TVAE Synthesizer

Use a variational autoencoder ML model to learn from real data. This may take longer to learn and be harder to debug.

❖ XGCSynthesizer

Use extra features on top of Gaussian Copula for higher quality synthetic data and improved performance.

❖ BootstrapSynthesizer

A synthesizer that is optimized to learn from a smaller number of training rows.

❖ SegmentSynthesizer

Use this synthesizer when your real data is highly segmented, with different patterns for each.

*DayZSynthesizer

Generate synthetic data from scratch. Use this when you don't have a lot of real data.

❖ DPGCSynthesizer

Use Gaussian Copula while guaranteeing differential privacy.

❖ DPGCFlexSynthesizer

[Experimental] Use and customize Gaussian Copula while guaranteeing differential privacy.