Synthetic Data Vault
GitHubSlackDataCebo
  • Welcome to the SDV!
  • Tutorials
  • Explore SDV
    • SDV Community
    • SDV Enterprise
      • ⭐Compare Features
    • SDV Bundles
      • ❖ AI Connectors
      • ❖ CAG
      • ❖ Differential Privacy
      • ❖ XSynthesizers
  • Single Table Data
    • Data Preparation
      • Loading Data
      • Creating Metadata
    • Modeling
      • Synthesizers
        • GaussianCopulaSynthesizer
        • CTGANSynthesizer
        • TVAESynthesizer
        • ❖ XGCSynthesizer
        • ❖ BootstrapSynthesizer
        • ❖ SegmentSynthesizer
        • * DayZSynthesizer
        • ❖ DPGCSynthesizer
        • ❖ DPGCFlexSynthesizer
        • CopulaGANSynthesizer
      • Customizations
        • Constraints
        • Preprocessing
    • Sampling
      • Sample Realistic Data
      • Conditional Sampling
    • Evaluation
      • Diagnostic
      • Data Quality
      • Visualization
      • Privacy
        • Empirical Differential Privacy
        • SDMetrics: Privacy Metrics
  • Multi Table Data
    • Data Preparation
      • Loading Data
        • Demo Data
        • CSV
        • Excel
        • ❖ AlloyDB
        • ❖ BigQuery
        • ❖ MSSQL
        • ❖ Oracle
        • ❖ Spanner
      • Cleaning Your Data
      • Creating Metadata
    • Modeling
      • Synthesizers
        • * DayZSynthesizer
        • * IndependentSynthesizer
        • HMASynthesizer
        • * HSASynthesizer
      • Customizations
        • Constraints
        • Preprocessing
      • * Performance Estimates
    • Sampling
    • Evaluation
      • Diagnostic
      • Data Quality
      • Visualization
  • Sequential Data
    • Data Preparation
      • Loading Data
      • Cleaning Your Data
      • Creating Metadata
    • Modeling
      • PARSynthesizer
      • Customizations
    • Sampling
      • Sample Realistic Data
      • Conditional Sampling
    • Evaluation
  • Concepts
    • Metadata
      • Sdtypes
      • Metadata API
      • Metadata JSON
    • Constraint-Augmented Generation (CAG)
      • Predefined Constraints
        • FixedCombinations
        • FixedIncrements
        • Inequality
        • OneHotEncoding
        • Range
        • ❖ CarryOverColumns
        • * ChainedInequality
        • ❖ CompositeKey
        • ❖ FixedNullCombinations
        • ❖ ForeignToForeignKey
        • ❖ ForeignToPrimaryKeySubset
        • ❖ MixedScales
        • ❖ PrimaryToPrimaryKey
        • ❖ PrimaryToPrimaryKeySubset
        • ❖ ReferenceTable
        • ❖ SelfReferentialHierarchy
        • ❖ UniqueBridgeTable
      • Program Your Own Constraint
      • Constraints API
  • Support
    • Troubleshooting
      • Help with Installation
      • Help with SDV
    • Versioning & Backwards Compatibility Policy
Powered by GitBook

Copyright (c) 2023, DataCebo, Inc.

On this page
  • Single Table
  • Multi Table
  1. Concepts
  2. Constraint-Augmented Generation (CAG)

Predefined Constraints

PreviousConstraint-Augmented Generation (CAG)NextFixedCombinations

Last updated 1 day ago

Predefined constraint classes are available for frequently occurring business rules. The business rules can be isolated to a single table or can be applied across multiple tables.

Single Table

These constraints can be applied within an individual table of your dataset

Constraint
Description
Example

No shuffling is allowed other than what's already observed in the data

The city and country values cannot be shuffled to create new permutations.

All the numerical values are increments of a whole number

All values in salary must be divisible by 1000

The value in one column must always be greater than the other

The checkout_date must always be after the checkin_date

The original data columns represent a one hot encoding scheme

Exactly 1 of the following columns has a 1 in each row: not_subscribed, basic_subscriber, premium

The value in one column is bounded by the values in other columns

The parent_age must be in between child_age and grandparent_age

A chain of 2 or more columns in an inequality

purchase_date < start_date < end_date < expiration_date < termination_date

Multiple columns together form a primary key in a table.

A combination of Patient ID and Date uniquely identify each record in a table.

No shuffling is around for the missing values, other than what's already observed in the data

The city and country columns must both either be null together or not at all.

The value of one categorical column determines the scale of another numerical column.

If the value of test_type is 'blood_pressure' then the value of test_result must be within a reasonable for this test only.

A column in the table refers to a different column in the same table.

The Manager ID column refers to the Employee ID column in the same table.

Multi Table

These constraints can be applied between multiple different tables of your dataset.

Constraint
Description
Example

The same columns are present in a parent table and a child table, and the values of those columns have to match up according to the connection.

The account Type in one table must match the corresponding account Type in another table.

Multiple columns together form a primary key and foreign key connection.

A combination of Patient ID and Date uniquely identify each record in a table.

There are foreign keys in multiple tables but no primary key to attach them to.

The Warehouse ID column in multiple tables is referring to the same concept.

There is a 1-to-many connection between tables, but only certain values are allowed to have connections.

Only accounts with Type=Premium are allowed to have children in another table.

There is an exact 1-to-1 connection between the primary keys of two or more tables.

There is an exact 1-to-1 relationship between table Users and table Supplemental Info

There is a 1-to-1 connection between the primary keys of two or more tables but only certain values are allowed to have connections.

Only users with Is Minor=True are allowed to have an entry in another table.

A table acts as an unchangeable reference. You do not want to synthesize an new information in it.

The City table should act as a reference; you do not want to synthesize new cities.

A bridge table that records a many-to-many relationship between two other tabes, and the connections have to be unique

The Author-Book table connects an author to a book — but the connection can only occur once.

*

❖

❖

❖

❖

❖

❖

❖

❖

❖

❖

❖

❖

FixedCombinations
FixedIncrements
Inequality
OneHotEncoding
Range
ChainedInequality
CompositeKey
FixedNullCombinations
MixedScales
SelfReferentialHierarchy
CarryOverColumns
CompositeKey
ForeignToForeignKey
ForeignToPrimaryKeySubset
PrimaryToPrimaryKey
PrimaryToPrimaryKeySubset
ReferenceTable
UniqueBridgeTable

*SDV Enterprise Feature. This feature is only available for licensed, enterprise users. For more information, visit our page to .

Compare SDV Features

For more information about using predefined constraints, please see the .

Constraint-Augmented Generation tutorial

❖ SDV Enterprise Bundle. This feature is available as part of the CAG Bundle, an optional add-on to SDV Enterprise. For more information, please visit the page.

CAG Bundle