Synthetic Data Vault
GitHubSlackDataCebo
  • Welcome to the SDV!
  • Tutorials
  • Explore SDV
    • SDV Community
    • SDV Enterprise
      • ⭐Compare Features
    • SDV Bundles
      • ❖ AI Connectors
      • ❖ CAG
      • ❖ Differential Privacy
      • ❖ XSynthesizers
  • Single Table Data
    • Data Preparation
      • Loading Data
      • Creating Metadata
    • Modeling
      • Synthesizers
        • GaussianCopulaSynthesizer
        • CTGANSynthesizer
        • TVAESynthesizer
        • ❖ XGCSynthesizer
        • ❖ SegmentSynthesizer
        • * DayZSynthesizer
        • ❖ DPGCSynthesizer
        • ❖ DPGCFlexSynthesizer
        • CopulaGANSynthesizer
      • Customizations
        • Constraints
        • Preprocessing
    • Sampling
      • Sample Realistic Data
      • Conditional Sampling
    • Evaluation
      • Diagnostic
      • Data Quality
      • Visualization
  • Multi Table Data
    • Data Preparation
      • Loading Data
        • Demo Data
        • CSV
        • Excel
        • ❖ AlloyDB
        • ❖ BigQuery
        • ❖ MSSQL
        • ❖ Oracle
        • ❖ Spanner
      • Cleaning Your Data
      • Creating Metadata
    • Modeling
      • Synthesizers
        • * DayZSynthesizer
        • * IndependentSynthesizer
        • HMASynthesizer
        • * HSASynthesizer
      • Customizations
        • Constraints
        • Preprocessing
      • * Performance Estimates
    • Sampling
    • Evaluation
      • Diagnostic
      • Data Quality
      • Visualization
  • Sequential Data
    • Data Preparation
      • Loading Data
      • Cleaning Your Data
      • Creating Metadata
    • Modeling
      • PARSynthesizer
      • Customizations
    • Sampling
      • Sample Realistic Data
      • Conditional Sampling
    • Evaluation
  • Concepts
    • Metadata
      • Sdtypes
      • Metadata API
      • Metadata JSON
    • Constraints
      • Predefined Constraints
        • Positive
        • Negative
        • ScalarInequality
        • ScalarRange
        • FixedIncrements
        • FixedCombinations
        • ❖ FixedNullCombinations
        • ❖ MixedScales
        • OneHotEncoding
        • Inequality
        • Range
        • * ChainedInequality
      • Custom Logic
        • Example: IfTrueThenZero
      • ❖ Constraint Augmented Generation (CAG)
        • ❖ CarryOverColumns
        • ❖ CompositeKey
        • ❖ ForeignToForeignKey
        • ❖ ForeignToPrimaryKeySubset
        • ❖ PrimaryToPrimaryKey
        • ❖ PrimaryToPrimaryKeySubset
        • ❖ SelfReferentialHierarchy
        • ❖ ReferenceTable
        • ❖ UniqueBridgeTable
  • Support
    • Troubleshooting
      • Help with Installation
      • Help with SDV
    • Versioning & Backwards Compatibility Policy
Powered by GitBook

Copyright (c) 2023, DataCebo, Inc.

On this page
  • Constraint API
  • Usage
  1. Concepts
  2. Constraints
  3. ❖ Constraint Augmented Generation (CAG)

❖ ForeignToForeignKey

Previous❖ CompositeKeyNext❖ ForeignToPrimaryKeySubset

Last updated 16 days ago

Use the ForeignToForeignKey constraint when you have foreign keys in multiple tables but no primary key to attach them to. This may happen if your tables come from different domains and are linked together by the same concept.

Constraint API

Create a ForeignToForeignKey constraint.

Parameters:

  • (required) columns: A list of dictionaries representing representing all the foreign key columns that are encoding the same concept. Each dictionary should have

    • A 'table_name' key mapping to the string name of the table, and

    • A 'foreign_key' key mapping to the string name of the column. (If you have a composite key, provide a tuple of multiple strings.)

  • foreign_key_generation: A string that describes whether the synthetic data for the foreign keys should contain brand new values, or reuse the ones that exist in your database

    • (default) 'new': Create new values in the synthetic data. These new values will be consistent everywhere in the database. In our example above, the synthetic data would have brand-new Warehouse IDs representing new, synthetic warehouses. These warehouses would be consistent between the Products and Suppliers table.

    • 'reuse': Reuse the values from the synthetic data. In our example above, the synthetic data would have the same set of Warehouse IDs as the real data, within both the Products and Suppliers tables. These would represent the same warehouses.

from sdv.cag import ForeignToForeignKey

my_constraint = ForeignToForeignKey(
    columns=[{
        'table_name': 'Products',
        'foreign_key': 'Warehouse ID'
    },{
        'table_name': 'Shipments',
        'foreign_key': 'Warehouse ID'
    }],
    foreign_key_generation='new'
)

Usage

Apply the constraint to any SDV synthesizer. Then fit and sample as usual.

synthesizer = HSASynthesizer(metadata)
synthesizer.add_cag([my_constraint])

synthesizer.fit(data)
synthetic_data = synthesizer.sample()

Make sure that all the tables and columns you provide are listed in your .

Metadata
In this example, the 'Products' table and the 'Shipments' table both have a column called 'Warehouse ID' that denotes the same concept (storage warehouse). Both these columns are foreign keys, and there is no primary key for Warehouse ID that is available to link them.

❖ SDV Enterprise Bundle. This feature is available as part of the CAG Bundle, an optional add-on to SDV Enterprise. For more information, please visit the page.

CAG Bundle