Synthetic Data Vault
GitHubSlackDataCebo
  • Welcome to the SDV!
  • Tutorials
  • Explore SDV
    • SDV Community
    • SDV Enterprise
      • ⭐Compare Features
    • SDV Bundles
      • ❖ AI Connectors
      • ❖ CAG
      • ❖ Differential Privacy
      • ❖ XSynthesizers
  • Single Table Data
    • Data Preparation
      • Loading Data
      • Creating Metadata
    • Modeling
      • Synthesizers
        • GaussianCopulaSynthesizer
        • CTGANSynthesizer
        • TVAESynthesizer
        • ❖ XGCSynthesizer
        • ❖ SegmentSynthesizer
        • * DayZSynthesizer
        • ❖ DPGCSynthesizer
        • ❖ DPGCFlexSynthesizer
        • CopulaGANSynthesizer
      • Customizations
        • Constraints
        • Preprocessing
    • Sampling
      • Sample Realistic Data
      • Conditional Sampling
    • Evaluation
      • Diagnostic
      • Data Quality
      • Visualization
  • Multi Table Data
    • Data Preparation
      • Loading Data
        • Demo Data
        • CSV
        • Excel
        • ❖ AlloyDB
        • ❖ BigQuery
        • ❖ MSSQL
        • ❖ Oracle
        • ❖ Spanner
      • Cleaning Your Data
      • Creating Metadata
    • Modeling
      • Synthesizers
        • * DayZSynthesizer
        • * IndependentSynthesizer
        • HMASynthesizer
        • * HSASynthesizer
      • Customizations
        • Constraints
        • Preprocessing
      • * Performance Estimates
    • Sampling
    • Evaluation
      • Diagnostic
      • Data Quality
      • Visualization
  • Sequential Data
    • Data Preparation
      • Loading Data
      • Cleaning Your Data
      • Creating Metadata
    • Modeling
      • PARSynthesizer
      • Customizations
    • Sampling
      • Sample Realistic Data
      • Conditional Sampling
    • Evaluation
  • Concepts
    • Metadata
      • Sdtypes
      • Metadata API
      • Metadata JSON
    • Constraints
      • Predefined Constraints
        • Positive
        • Negative
        • ScalarInequality
        • ScalarRange
        • FixedIncrements
        • FixedCombinations
        • ❖ FixedNullCombinations
        • ❖ MixedScales
        • OneHotEncoding
        • Inequality
        • Range
        • * ChainedInequality
      • Custom Logic
        • Example: IfTrueThenZero
      • ❖ Constraint Augmented Generation (CAG)
        • ❖ CarryOverColumns
        • ❖ CompositeKey
        • ❖ ForeignToForeignKey
        • ❖ ForeignToPrimaryKeySubset
        • ❖ PrimaryToPrimaryKey
        • ❖ PrimaryToPrimaryKeySubset
        • ❖ SelfReferentialHierarchy
        • ❖ ReferenceTable
        • ❖ UniqueBridgeTable
  • Support
    • Troubleshooting
      • Help with Installation
      • Help with SDV
    • Versioning & Backwards Compatibility Policy
Powered by GitBook

Copyright (c) 2023, DataCebo, Inc.

On this page
  • Constraint API
  • Usage
  1. Concepts
  2. Constraints
  3. ❖ Constraint Augmented Generation (CAG)

❖ SelfReferentialHierarchy

Previous❖ PrimaryToPrimaryKeySubsetNext❖ ReferenceTable

Last updated 15 days ago

Use the SelfReferentialHierarchy constraint when you have a column in the table that references the primary key column of the same table (aka a self-reference); and the self-references are not allowed to have any cycles.

Constraint API

This functionality is in Beta. At this time, select SDV Enterprise users have been invited to use this feature.

Create a SelfReferentialHierarchy constraint.

Parameters:

  • (required) table_name: A string with the name of the table that contains the self-reference

  • (required) primary_key: A string with the name of the primary key column in the table

  • (required) foreign_key: A string with the name of the foreign key column in the same table that references the primary key

  • scaling_method: A string with the name of the method, used when scaling up the synthetic data

    • (default) 'branch': Keep the original depth of the hierarchy but add more branches to it. In our example above, this would add more reports for a given manager.

    • 'depth': Add to the depth of the self-references. In our example above, this would create new levels of managers, increasing the length of the reporting chain to the CEO.

from sdv.cag import SelfReferentialHierarchy

my_constraint = SelfReferentialHierarchy(
    table_name='Employees',
    primary_key='Employee ID',
    foreign_key='Manager ID')

Usage

Apply the constraint to any SDV synthesizer. Then fit and sample as usual.

synthesizer = HSASynthesizer(metadata)
synthesizer.add_cag([my_constraint])

synthesizer.fit(data)
synthetic_data = synthesizer.sample()

Make sure that all the table and columns in you provide are in your , and have a primary key associated with them. Note that you cannot supply a self-reference relationship in the metadata right now, so the relationships section of your Metadata can be blank.

Metadata

❖ SDV Enterprise Bundle. This feature is available as part of the CAG Bundle, an optional add-on to SDV Enterprise. For more information, please visit the page.

CAG Bundle
In this example, the "Manger ID" column references the primary key, "Employee ID" that is in the same table. This self-reference is not allowed to have cycles, meaning that the manager/employee relationships forms a strict reporting chain all the way up to the CEO. (If person A manages person B, then B cannot manage person A.)