Range

The Range constraint enforces that for all rows, the value of one of the columns is bounded by the values in the other two columns.

Constraint API

Create a Range constraint.

  • (required) low_column_name: The name of the column that contains the lowest value. This must be a numerical or datetime column.

  • (required) middle_column_name: The name of the column that must be between the low and the high columns. This must be a numerical or datetime column.

  • (required) high_column_name: The name of the column that contains the highest value. This must be a numerical or datetime column.

  • strict_boundaries: Whether the boundaries between each of the comparisons are strict

    • (default) True: The middle column must be strictly greater than the low column and strictly less than the high column.

    • False: The middle column must be greater than or equal to the low column and less than or equal to the high column

  • table_name: A string with the name of the table to apply this to. Required if you have a multi-table dataset.

from sdv.cag import Range

my_constraint = Range(
    low_column_name='child_age',
    middle_column_name='parent_age',
    high_column_name='grandparent_age',
    strict_bounadires=True
)

Usage

Apply the constraint to any SDV synthesizer. Then fit and sample as usual.

synthesizer = GaussianCopulaSynthesizer(metadata)
synthesizer.add_constraints([my_constraint])

synthesizer.fit(data)
synthetic_data = synthesizer.sample()

For more information about using predefined constraints, please see the Constraint-Augmented Generation tutorial.

FAQs

What happens to missing values?

This constraint ignores missing values. The constraint considered is valid as long as the numerical values (non-missing values) follow the logic.

What if I want to compare a column to fixed values?

Many of our SDV synthesizers are already designed to learned the min/max values in every column and replicate the ranges in the synthetic data. This parameter is often called enforce_min_max_values and it applies to all numerical/datetime columns. For more information, check your synthesizer's API guide.

You can also control the enforcement on a per-column basis. Turn on/off the enforcement on individual columns by accessing and updating the transformers. For more information, see the Preprocessing guide.

Both of these options will allow you to fix the range (as observed in the real data) or expand it (by not enforcing it). If you'd like to further restrict the range, we encourage you to model the data as-is and use conditional sampling to get the range you need.

Last updated