Inequality

The Inequality constraint enforces an inequality relationship between a pair of columns. For every row, the value in one column must be greater than a value in another.

Constraint API

Parameters

  • (required) low_column_name: The name of the column whose values must be lower. Only numerical and datetime columns are allowed.

  • (required) high_column_name: The name of the column whose values must be greater. Only numerical and datetime columns are allowed.

  • strict_boundaries: Whether the high column must be strictly greater than the low column

    • (default) True: The value in the high column must be strictly greater than the value in the low column

    • False: The value in the high column must be greater than or equal to the value in the low column.

  • table_name: A string with the name of the table to apply this to. Required if you have a multi-table dataset.

from sdv.cag import Inequality

my_constraint = Inequality(
    low_column_name='checkin_date',
    high_column_name='checkout_date',
    strict_boundaries=True
)

Usage

Apply the constraint to any SDV synthesizer. Then fit and sample as usual.

synthesizer = GaussianCopulaSynthesizer(metadata)
synthesizer.add_constraints([my_constraint])

synthesizer.fit(data)
synthetic_data = synthesizer.sample()

For more information about using predefined constraints, please see the Constraint-Augmented Generation tutorial.

FAQs

What happens to missing values?

This constraint ignores missing values. The constraint considered is valid as long as the numerical values (non-missing values) follow the inequality.

What if I want to compare a column to a single, fixed value?

Many of our SDV synthesizers are already designed to learned the min/max values in every column and replicate the ranges in the synthetic data. This parameter is often called enforce_min_max_values and it applies to all numerical/datetime columns. For more information, check your synthesizer's API guide.

You can also control the enforcement on a per-column basis. Turn on/off the enforcement on individual columns by accessing and updating the transformers. For more information, see the Preprocessing guide.

Both of these options will allow you to fix the range (as observed in the real data) or expand it (by not enforcing it). If you'd like to further restrict the range, we encourage you to model the data as-is and use conditional sampling to get the range you need.

Last updated