Synthetic Data Vault
GitHubSlackDataCebo
  • Welcome to the SDV!
  • Tutorials
  • Explore SDV
    • SDV Community
    • SDV Enterprise
      • ⭐Compare Features
    • SDV Bundles
      • ❖ AI Connectors
      • ❖ CAG
      • ❖ Differential Privacy
      • ❖ XSynthesizers
  • Single Table Data
    • Data Preparation
      • Loading Data
      • Creating Metadata
    • Modeling
      • Synthesizers
        • GaussianCopulaSynthesizer
        • CTGANSynthesizer
        • TVAESynthesizer
        • ❖ XGCSynthesizer
        • ❖ BootstrapSynthesizer
        • ❖ SegmentSynthesizer
        • * DayZSynthesizer
        • ❖ DPGCSynthesizer
        • ❖ DPGCFlexSynthesizer
        • CopulaGANSynthesizer
      • Customizations
        • Constraints
        • Preprocessing
    • Sampling
      • Sample Realistic Data
      • Conditional Sampling
    • Evaluation
      • Diagnostic
      • Data Quality
      • Visualization
      • Privacy
        • Empirical Differential Privacy
        • SDMetrics: Privacy Metrics
  • Multi Table Data
    • Data Preparation
      • Loading Data
        • Demo Data
        • CSV
        • Excel
        • ❖ AlloyDB
        • ❖ BigQuery
        • ❖ MSSQL
        • ❖ Oracle
        • ❖ Spanner
      • Cleaning Your Data
      • Creating Metadata
    • Modeling
      • Synthesizers
        • * DayZSynthesizer
        • * IndependentSynthesizer
        • HMASynthesizer
        • * HSASynthesizer
      • Customizations
        • Constraints
        • Preprocessing
      • * Performance Estimates
    • Sampling
    • Evaluation
      • Diagnostic
      • Data Quality
      • Visualization
  • Sequential Data
    • Data Preparation
      • Loading Data
      • Cleaning Your Data
      • Creating Metadata
    • Modeling
      • PARSynthesizer
      • Customizations
    • Sampling
      • Sample Realistic Data
      • Conditional Sampling
    • Evaluation
  • Concepts
    • Metadata
      • Sdtypes
      • Metadata API
      • Metadata JSON
    • Constraint-Augmented Generation (CAG)
      • Predefined Constraints
        • FixedCombinations
        • FixedIncrements
        • Inequality
        • OneHotEncoding
        • Range
        • ❖ CarryOverColumns
        • * ChainedInequality
        • ❖ CompositeKey
        • ❖ FixedNullCombinations
        • ❖ ForeignToForeignKey
        • ❖ ForeignToPrimaryKeySubset
        • ❖ MixedScales
        • ❖ PrimaryToPrimaryKey
        • ❖ PrimaryToPrimaryKeySubset
        • ❖ ReferenceTable
        • ❖ SelfReferentialHierarchy
        • ❖ UniqueBridgeTable
      • Program Your Own Constraint
      • Constraints API
  • Support
    • Troubleshooting
      • Help with Installation
      • Help with SDV
    • Versioning & Backwards Compatibility Policy
Powered by GitBook

Copyright (c) 2023, DataCebo, Inc.

On this page
  • Constraint API
  • Usage
  • FAQs
  1. Concepts
  2. Constraint-Augmented Generation (CAG)
  3. Predefined Constraints

❖ FixedNullCombinations

Previous❖ CompositeKeyNext❖ ForeignToForeignKey

Last updated 1 day ago

Do you want to apply this constraint to PII data? As part of its algorithm, this constraint will learn and remember some parameters in the synthesizer. While you can always share your synthetic data, we recommend caution when sharing out your synthesizer itself, as it will contain some of these parameters.

The FixedNullCombinations constraint enforces that the null combinations between a set of columns are fixed. That is, no other permutations or shuffling of null values is allowed other than what's already observed in the data.

Constraint API

Create a FixedNullCombinations constraint.

Parameters:

  • (required) column_names: A list of two or more columns whose combinations are fixed when it comes to null values. The SDV will not further shuffle the null values between these column names.

  • table_name: A string with the name of the table to apply this to. Required if you have a multi-table dataset.

from sdv.cag import FixedNullCombinations

my_constraint = FixedNullCombinations(
    column_names=['city', 'country']
)

Usage

Apply the constraint to any SDV synthesizer. Then fit and sample as usual.

synthesizer = GaussianCopulaSynthesizer(metadata)
synthesizer.add_constraints([my_constraint])

synthesizer.fit(data)
synthetic_data = synthesizer.sample()

FAQs

Why can't I apply this constraint to a single column?

This constraint ensures that the synthetic data only contains null combinations that exist in the real data. If there is only one column, there are no combinations.

The SDV already guarantees that the synthetic data contains a similar proportion of null values as the real data for a single column.

What happens to the values that are non-null?

This constraint will only fix the combinations of null values. For example, it will allow a synthesizer to learn cases where some columns must be null together, or not at all.

It will continue to create new permutations of non-null values. If you would like to fix all combinations (null and non-null values), please apply the constraint instead.

FixedCombinations

❖ SDV Enterprise Bundle. This feature is available as part of the CAG Bundle, an optional add-on to SDV Enterprise. For more information, please visit the page.

CAG Bundle

For more information about using predefined constraints, please see the .

Constraint-Augmented Generation tutorial