Synthetic Data Vault
GitHubSlackDataCebo
  • Welcome to the SDV!
  • Tutorials
  • Explore SDV
    • SDV Community
    • SDV Enterprise
      • ⭐Compare Features
    • SDV Bundles
      • ❖ AI Connectors
      • ❖ CAG
      • ❖ Differential Privacy
      • ❖ XSynthesizers
  • Single Table Data
    • Data Preparation
      • Loading Data
      • Creating Metadata
    • Modeling
      • Synthesizers
        • GaussianCopulaSynthesizer
        • CTGANSynthesizer
        • TVAESynthesizer
        • ❖ XGCSynthesizer
        • ❖ BootstrapSynthesizer
        • ❖ SegmentSynthesizer
        • * DayZSynthesizer
        • ❖ DPGCSynthesizer
        • ❖ DPGCFlexSynthesizer
        • CopulaGANSynthesizer
      • Customizations
        • Constraints
        • Preprocessing
    • Sampling
      • Sample Realistic Data
      • Conditional Sampling
    • Evaluation
      • Diagnostic
      • Data Quality
      • Visualization
      • Privacy
        • Empirical Differential Privacy
        • SDMetrics: Privacy Metrics
  • Multi Table Data
    • Data Preparation
      • Loading Data
        • Demo Data
        • CSV
        • Excel
        • ❖ AlloyDB
        • ❖ BigQuery
        • ❖ MSSQL
        • ❖ Oracle
        • ❖ Spanner
      • Cleaning Your Data
      • Creating Metadata
    • Modeling
      • Synthesizers
        • * DayZSynthesizer
        • * IndependentSynthesizer
        • HMASynthesizer
        • * HSASynthesizer
      • Customizations
        • Constraints
        • Preprocessing
      • * Performance Estimates
    • Sampling
    • Evaluation
      • Diagnostic
      • Data Quality
      • Visualization
  • Sequential Data
    • Data Preparation
      • Loading Data
      • Cleaning Your Data
      • Creating Metadata
    • Modeling
      • PARSynthesizer
      • Customizations
    • Sampling
      • Sample Realistic Data
      • Conditional Sampling
    • Evaluation
  • Concepts
    • Metadata
      • Sdtypes
      • Metadata API
      • Metadata JSON
    • Constraint-Augmented Generation (CAG)
      • Predefined Constraints
        • FixedCombinations
        • FixedIncrements
        • Inequality
        • OneHotEncoding
        • Range
        • ❖ CarryOverColumns
        • * ChainedInequality
        • ❖ CompositeKeys
        • ❖ FixedNullCombinations
        • ❖ ForeignToForeignKey
        • ❖ ForeignToPrimaryKeySubset
        • ❖ MixedScales
        • ❖ PrimaryToPrimaryKey
        • ❖ PrimaryToPrimaryKeySubset
        • ❖ ReferenceTable
        • ❖ SelfReferentialHierarchy
        • ❖ UniqueBridgeTable
      • Program Your Own Constraint
      • Constraints API
  • Support
    • Troubleshooting
      • Help with Installation
      • Help with SDV
    • Versioning & Backwards Compatibility Policy
Powered by GitBook

Copyright (c) 2023, DataCebo, Inc.

On this page
  • Constraint API
  • Usage
  1. Concepts
  2. Constraint-Augmented Generation (CAG)
  3. Predefined Constraints

❖ CompositeKeys

Previous* ChainedInequalityNext❖ FixedNullCombinations

Last updated 22 hours ago

Use the CompositeKeys constraint when multiple columns together form a primary key. Optionally, you may also have multi-column foreign key that connects to the primary key.

Constraint API

Create a CompositeKeys constraint that lists all the composite keys of your database (primary and foreign keys).

Parameters:

  • (required) primary_keys: A list of dictionaries that define the composite primary keys of all tables that have composite primary keys. Each dictionary should have the following keys:

    • 'table_name': A string with the name of the table

    • 'primary_key': A list of strings representing all the columns in the table that form the composite, primary key. At least 1 of these columns should be an sdtype id or PII column.

  • relationships: A list of dictionaries that define the relationships from composite foreign keys to the composite primary keys defined above. Each dictionary should have the following keys:

    • 'parent_table_name': A string with the name of the parent table with the composite primary key (this should be listed in the primary_keys section above).

    • 'parent_primary_key': A list of columns that comprise the composite primary key of the table (this should be the same as the primary_keys section above)

    • 'child_table_name': A string with the name of the child table that refers to the parent

    • 'child_foreign_key': A list of columns that comprise the composite foreign key that refers to the parent. The length of this list should be the same as the primary key that it refers to.

from sdv.cag import CompositeKeys

my_constraint = CompositeKeys(
    primary_keys=[{
        'table_name': 'Patient Visits',
        'primary_key': ['Patient ID', 'Date']
    }],
    relationships=[{
        'parent_table_name': 'Patient Visits',
        'parent_primary_key': ['Patient ID', 'Date'],
        'child_table_name': 'Test Results',
        'child_foreign_key': ['Patient ID', 'Visit Date']
    }]
)

Make sure that all the table and columns in you provide are in your Metadata. If you provide any foreign key connections, make sure they align with the columns in the primary key.

Usage

Apply the constraint to any SDV synthesizer. Then fit and sample as usual.

synthesizer = HSASynthesizer(metadata)
synthesizer.add_constraints([my_constraint])

synthesizer.fit(data)
synthetic_data = synthesizer.sample()
'Patient ID' and 'Date' together form the primary key of the table.

❖ SDV Enterprise bundle. This feature is available for purchase as an SDV Enterprise bundle. For more information, visit our page to Explore SDV.

For more information about using predefined constraints, please see the Constraint-Augmented Generation tutorial.

This functionality is in Beta. At this time, select SDV Enterprise users are able to use this feature and provide feedback.