ScalarRange

Compatibility: A single numerical or datetime column

The ScalarRange constraint enforces that all the values in a column are in between two known, fixed values. That is, it enforces upper and lower bounds to the data.

Some models already learn the min and max values of every column in the real dataset and enforce the bounds in the synthetic dataset. For such models, you do not need to add this constraint.

Parameters

(required) column_name: The name of the column that must follow the constraint

(required) low_value: The lower bound of the range

(required) high_value: The upper bound of the range

strict_boundaries: Whether the column must be strictly in between the low and high values

(default) True

The column must be strictly greater than the low value, and strictly less than the high value.

False

The column must be greater than or equal to the low value, and less than or equal to the high value.

Example

Define your constraint using the parameters and then add it to a synthesizer.

my_constraint = {
    'constraint_class': 'ScalarRange',
    'table_name': 'guests', # for multi table synthesizers
    'constraint_parameters': {
        'column_name': 'amenities_fee',
        'low_value': 0.0,
        'high_value': 500.0,
        'strict_boundaries': False
    }
}

my_synthesizer.add_constraints(constraints=[
    my_constraint
])

FAQs

Why am I getting a privacy warning?

Adding this constraint exposes the min and max values of the data. This may be private information. Here are a few questions to ask yourself:

  • Are the min and max values of the real data well-known, or did I discover them by looking at the real data?

  • Can someone use the min and max values to uncover other sensitive attributes?

  • Who do I plan to share the synthetic data with?

Always evaluate the privacy risk before sharing your synthetic data broadly.

What happens to missing values?

This constraint ignores missing values. The constraint considered is valid as long as the numerical values (non-missing values) are within the range.

What if I want one bound to be strict but not the other?

This constraint can only be used when both the upper and lower bounds are strict, or both are not strict. If they are different, use the ScalarInequality constraint twice; once for the lower bound and once for the upper bound.

Last updated

Copyright (c) 2023, DataCebo, Inc.