Synthetic Data Vault
GitHubSlackDataCebo
  • Welcome to the SDV!
  • Tutorials
  • Explore SDV
    • SDV Community
    • SDV Enterprise
      • ⭐Compare Features
    • SDV Bundles
      • ❖ AI Connectors
      • ❖ CAG
      • ❖ Differential Privacy
      • ❖ XSynthesizers
  • Single Table Data
    • Data Preparation
      • Loading Data
      • Creating Metadata
    • Modeling
      • Synthesizers
        • GaussianCopulaSynthesizer
        • CTGANSynthesizer
        • TVAESynthesizer
        • ❖ XGCSynthesizer
        • ❖ SegmentSynthesizer
        • * DayZSynthesizer
        • ❖ DPGCSynthesizer
        • ❖ DPGCFlexSynthesizer
        • CopulaGANSynthesizer
      • Customizations
        • Constraints
        • Preprocessing
    • Sampling
      • Sample Realistic Data
      • Conditional Sampling
    • Evaluation
      • Diagnostic
      • Data Quality
      • Visualization
  • Multi Table Data
    • Data Preparation
      • Loading Data
        • Demo Data
        • CSV
        • Excel
        • ❖ AlloyDB
        • ❖ BigQuery
        • ❖ MSSQL
        • ❖ Oracle
        • ❖ Spanner
      • Cleaning Your Data
      • Creating Metadata
    • Modeling
      • Synthesizers
        • * DayZSynthesizer
        • * IndependentSynthesizer
        • HMASynthesizer
        • * HSASynthesizer
      • Customizations
        • Constraints
        • Preprocessing
      • * Performance Estimates
    • Sampling
    • Evaluation
      • Diagnostic
      • Data Quality
      • Visualization
  • Sequential Data
    • Data Preparation
      • Loading Data
      • Cleaning Your Data
      • Creating Metadata
    • Modeling
      • PARSynthesizer
      • Customizations
    • Sampling
      • Sample Realistic Data
      • Conditional Sampling
    • Evaluation
  • Concepts
    • Metadata
      • Sdtypes
      • Metadata API
      • Metadata JSON
    • Constraints
      • Predefined Constraints
        • Positive
        • Negative
        • ScalarInequality
        • ScalarRange
        • FixedIncrements
        • FixedCombinations
        • ❖ FixedNullCombinations
        • ❖ MixedScales
        • OneHotEncoding
        • Inequality
        • Range
        • * ChainedInequality
      • Custom Logic
        • Example: IfTrueThenZero
      • ❖ Constraint Augmented Generation (CAG)
        • ❖ CarryOverColumns
        • ❖ CompositeKey
        • ❖ ForeignToForeignKey
        • ❖ ForeignToPrimaryKeySubset
        • ❖ PrimaryToPrimaryKey
        • ❖ PrimaryToPrimaryKeySubset
        • ❖ SelfReferentialHierarchy
        • ❖ ReferenceTable
        • ❖ UniqueBridgeTable
  • Support
    • Troubleshooting
      • Help with Installation
      • Help with SDV
    • Versioning & Backwards Compatibility Policy
Powered by GitBook

Copyright (c) 2023, DataCebo, Inc.

On this page
  • Usage
  • <synthesizer>.add_cag
  • <synthesizer>.get_cag
  • <synthesizer>.get_metadata
  • Explore Constraints
  • FAQs
  1. Concepts
  2. Constraints

❖ Constraint Augmented Generation (CAG)

PreviousExample: IfTrueThenZeroNext❖ CarryOverColumns

Last updated 16 days ago

Constraint Augmented Generation (CAG) is a powerful system that allows you to input business logic into your complex schemas. Fast, easy, and flexible, CAG patterns allow you to create synthetic data that conforms to your logic, 100% of the time.

⭐️ Apply constraints across multiple tables. Add business logic that determines the connections between tables, or the type of data that linked tables are allowed to have.

⭐️ Access complex, powerful algorithms with simple APIs. CAG supports constraints that are more complex, often incorporating different algorithms to get you the data you need. The best part? There's one, simple API to denote your constraint.

Usage

This functionality is in Beta. At this time, select SDV Enterprise users have been invited to use this feature.

To use CAG, define your constraint and add it to any SDV single or multi-table synthesizer.

from sdv.cag import PrimaryToPrimaryKey

# define your CAG pattern
my_constraint = PrimaryToPrimaryKey(
    table_names=['Accounts', 'Supplemental_Account_Info']
)

# add it to any multi-table synthesizer
synthesizer = HSASynthesizer(metadata)
synthesizer.add_cag([my_constraint])

synthesizer.fit(data)
synthetic_data = synthesizer.sample()

<synthesizer>.add_cag

Use this function to add constraints to any single or multi-table synthesizer that uses AI-generation techniques.

Parameters:

  • (required) patterns: A list of constraints (explore the constraints below)

Output (None)

synthesizer.add_cag(patterns=[my_constraint1, my_constraint2, ...])

<synthesizer>.get_cag

Get all the constraints that are attached to your synthesizer.

Parameters: None

Output: A list of the constraints that the synthesizer uses, in order of use

constraints = synthesizer.get_cag()

<synthesizer>.get_metadata

Use this function to access the metadata object that you have included for the synthesizer

Parameters

  • version: The version of metadata that you are requesting. The metadata may change between before vs. after applying the CAG algorithms

    • (default) 'original': Return the original metadata that you used to instantiate the synthesizer

    • 'modified': Return the metadata that will be created after applying the constraints. If there are constraints present, then it will be different from the original metadata. Otherwise, it will be the same.

Explore Constraints

Pattern Name
Description

The same columns are present in a parent table and a child table, and the values of those columns have to match up according to the connection.

Multiple columns together form a primary key and foreign key connection.

There are foreign keys in multiple tables but no primary key to attach them to.

There is a 1-to-many connection between tables, but only certain values are allowed to have connections.

There is an exact 1-to-1 connection between the primary keys of two or more tables.

There is a 1-to-1 connection between the primary keys of two or more tables but only certain values are allowed to have connections.

A table acts as an unchangeable reference. You do not want to synthesize an new information in it.

A column in the table that references the primary key column of the same table (aka a self-reference)

A bridge table that records a many-to-many relationship between two other tabes, and the connections have to be unique

FAQs

What is the difference between CAG and a single-table, predefined constraint?

CAG is an entirely new, more powerful system for handling constraints. The older constraint system is only able to handle simple logic that happens within a table. Meanwhile, CAG can handle business logic that happens in between multiple tables, and determine better connections between the tables.

In terms of the product API, our team is working to align both the CAG and predefined constraints in future iterations.

❖

❖

❖

❖

❖

❖

❖

❖

❖

CarryOverColumns
CompositeKey
ForeignToForeignKey
ForeignToPrimaryKeySubset
PrimaryToPrimaryKey
PrimaryToPrimaryKeySubset
ReferenceTable
SelfReferentialHierarchy
UniqueBridgeTable

❖ SDV Enterprise Bundle. This feature is available as part of the CAG Bundle, an optional add-on to SDV Enterprise. For more information, please visit the page.

CAG Bundle