❖ FixedNullCombinations

SDV Enterprise Bundle. This feature is available as part of the CAG Bundle, an optional add-on to SDV Enterprise. For more information, please visit the CAG Bundle page.

The FixedNullCombinations constraint enforces that the combinations between categorical columns are fixed. For other types of columns, it ensures that the combinations of null values are fixed. That is, no other permutations or shuffling is allowed other than what's already observed in the data.

The Support Cases table contains entries for each support ticket filed by customers. Only resolved cases have a resolution date — aka the Resolution Status column determines whether or not the Resolution Date can be null.

Constraint API

Create a FixedNullCombinations constraint.

Parameters:

  • (required) column_names: A list of two or more columns whose combinations are fixed. These columns can be any sdtype. However, they cannot be listed as primary or foreign keys.

  • table_name: A string with the name of the table to apply this to. Required if you have a multi-table dataset.

  • fix_category_values: Whether to fix the values for categorical columns

    • (default) True: If there are categorical columns, fix the combinations that appear with their actual value. (For other types of columns, the combinations of null vs. non-null values are fixed instead.)

    • False: For all column, only fix the combinations between null vs. non-null values. Allow there to be additional permutations within categorical values.

Usage

Apply the constraint to any SDV synthesizer. Then fit and sample as usual.

For more information about using predefined constraints, please see the Constraint-Augmented Generation tutorial.

FAQs

What is the difference between FixedCombinations and FixedNullCombinations?

The FixedCombinations constraint (available in SDV Community) fixes the combinations allowed within categorical columns. In contrast, the FixedNullCombinations constraints can be applied to other types of data such as numerical, datetime, PII etc. For these types of columns, the constraint cannot fix the actual values, as it possible to synthesizer new, previously-unseen numerical, datetime, or categorical values. However, it can fix whether a column is null or non-null.

The FixedNullCombinations then allows you to cover a large variety of scenarios. For example:

  • If several columns need to be null altogether, or not at all

  • If there should only be one non-null value within a group of columns

  • If the categorical value of one column influences whether another column is null (the example shown throughout this page)

Why can't I apply this constraint to a single column?

This constraint ensures that the synthetic data only contains null combinations that exist in the real data. If there is only one column, there are no combinations.

The SDV already guarantees that the synthetic data contains a similar proportion of null values as the real data for a single column.

Last updated