Synthetic Data Vault
GitHubSlackDataCebo
  • Welcome to the SDV!
  • Tutorials
  • Explore SDV
    • SDV Community
    • SDV Enterprise
      • ⭐Compare Features
    • SDV Bundles
      • ❖ AI Connectors
      • ❖ CAG
      • ❖ Differential Privacy
      • ❖ XSynthesizers
  • Single Table Data
    • Data Preparation
      • Loading Data
      • Creating Metadata
    • Modeling
      • Synthesizers
        • GaussianCopulaSynthesizer
        • CTGANSynthesizer
        • TVAESynthesizer
        • ❖ XGCSynthesizer
        • ❖ BootstrapSynthesizer
        • ❖ SegmentSynthesizer
        • * DayZSynthesizer
        • ❖ DPGCSynthesizer
        • ❖ DPGCFlexSynthesizer
        • CopulaGANSynthesizer
      • Customizations
        • Constraints
        • Preprocessing
    • Sampling
      • Sample Realistic Data
      • Conditional Sampling
    • Evaluation
      • Diagnostic
      • Data Quality
      • Visualization
      • Privacy
        • Empirical Differential Privacy
        • SDMetrics: Privacy Metrics
  • Multi Table Data
    • Data Preparation
      • Loading Data
        • Demo Data
        • CSV
        • Excel
        • ❖ AlloyDB
        • ❖ BigQuery
        • ❖ MSSQL
        • ❖ Oracle
        • ❖ Spanner
      • Cleaning Your Data
      • Creating Metadata
    • Modeling
      • Synthesizers
        • * DayZSynthesizer
        • * IndependentSynthesizer
        • HMASynthesizer
        • * HSASynthesizer
      • Customizations
        • Constraints
        • Preprocessing
      • * Performance Estimates
    • Sampling
    • Evaluation
      • Diagnostic
      • Data Quality
      • Visualization
  • Sequential Data
    • Data Preparation
      • Loading Data
      • Cleaning Your Data
      • Creating Metadata
    • Modeling
      • PARSynthesizer
      • Customizations
    • Sampling
      • Sample Realistic Data
      • Conditional Sampling
    • Evaluation
  • Concepts
    • Metadata
      • Sdtypes
      • Metadata API
      • Metadata JSON
    • Constraint-Augmented Generation (CAG)
      • Predefined Constraints
        • FixedCombinations
        • FixedIncrements
        • Inequality
        • OneHotEncoding
        • Range
        • ❖ CarryOverColumns
        • * ChainedInequality
        • ❖ CompositeKeys
        • ❖ FixedNullCombinations
        • ❖ ForeignToForeignKey
        • ❖ ForeignToPrimaryKeySubset
        • ❖ MixedScales
        • ❖ PrimaryToPrimaryKey
        • ❖ PrimaryToPrimaryKeySubset
        • ❖ ReferenceTable
        • ❖ SelfReferentialHierarchy
        • ❖ UniqueBridgeTable
      • Program Your Own Constraint
      • Constraints API
  • Support
    • Troubleshooting
      • Help with Installation
      • Help with SDV
    • Versioning & Backwards Compatibility Policy
Powered by GitBook

Copyright (c) 2023, DataCebo, Inc.

On this page
  • Constraint API
  • Usage
  1. Concepts
  2. Constraint-Augmented Generation (CAG)
  3. Predefined Constraints

❖ CarryOverColumns

PreviousRangeNext* ChainedInequality

Last updated 17 days ago

Use the CarryOverColumns constraint when you have the same columns in different tables, and the values of those columns have to match up according to a key (ID) column.

Constraint API

This functionality is in Beta. At this time, select SDV Enterprise users have been invited to use this feature.

Create a CarryOverColumns constraint

Parameters:

  • (required) common_column_info: A list of dictionaries that describe all the columns within your dataset that are carried over. Each dictionary should have the following keys:

    • table_name: The name of the table that contains the column

    • carryover_column_name: The name of the column that is repeated across your dataset

    • key_column_name: The name of the column to use as the key for the carried over column. Must be a PII or ID sdtype.

Please supply 2 or more columns for this constraint.

from sdv.cag import CarryOverColumns

my_constraint = CarryOverColumns(
    common_column_info=[{
        'table_name': 'Account',
        'carryover_column_name': 'Type',
        'key_column_name': 'ID'
    }, {
        'table_name': 'Transaction',
        'carryover_column_name': 'Type',
        'key_column_name': 'Account ID'
    }]
)

Make sure that all the tables and columns you provide are in your Metadata.

Usage

Apply the constraint to any SDV synthesizer. Then fit and sample as usual.

synthesizer = HSASynthesizer(metadata)
synthesizer.add_constraints([my_constraint])

synthesizer.fit(data)
synthetic_data = synthesizer.sample()
The 'Type' column is carried over based on the ID/Account ID information.

❖ SDV Enterprise Bundle. This feature is available as part of the CAG Bundle, an optional add-on to SDV Enterprise. For more information, please visit the CAG Bundle page.

For more information about using predefined constraints, please see the Constraint-Augmented Generation tutorial.