❖ DenormalizedTable

SDV Enterprise Bundle. This feature is available as part of the CAG Bundle, an optional add-on to SDV Enterprise. For more information, please visit the CAG Bundle page.

Use the DenormalizedTable constraint when you have a table that was created by denormalizing a parent and child table. The values of the parent table have been denormalized, meaning that they are repeated for every row in the child.

In this example, the Transactions-Users table is a denormalized table that contains information about transactions and users. The user information is repeated for each transaction that the user has made.

Constraint API

Create a DenormalizedTable constraint

Parameters:

  • (required) table_name: A string containing the name of the denormalized table

  • (required) denormalized_primary_key: A string containing the name of the column that represents the primary key of the denormalized, parent table. The values in this column are likely repeated as part of the denormalization process.

  • (required) denormalized_column_names: A list of strings containing the column names that were also a part of the denormalized, parent table. The values in these columns are also likely repeated. They should be consistent for every denormalized primary key value.

from sdv.cag import DenormalizedTable

my_constraint = DenormalizedTable(
    table_name='Transactions-Users',
    denormalized_primary_key='User ID',
    denormalized_column_names=['User Birthdate']
)

Make sure that the table and column names you provide are in your Metadata. The denormalized table may be part of a broader, multi-table dataset.

Usage

Apply the constraint to any SDV synthesizer. Then fit and sample as usual.

synthesizer = HSASynthesizer(metadata)
synthesizer.add_constraints([my_constraint])

synthesizer.fit(data)
synthetic_data = synthesizer.sample()

For more information about using predefined constraints, please see the Constraint-Augmented Generation tutorial.

Last updated