ScalarInequality

Compatibility: A single numerical or datetime column

The ScalarInequality constraint enforces that all values in a column are greater or less than a fixed (scalar) value. That is, it enforces a lower or upper bound to the synthetic data.

Some models already learn the min and max values of every column in the real dataset and enforce the bounds in the synthetic dataset. For such models, you do not need to add this constraint.

Parameters

(required) column_name: The name of the column that must follow the constraint

(required) relation: The inequality relation between the column name and the value

'>'

The column is greater than the value

'>='

The column is greater than or equal to the value

'<'

The column is less than the value

'<='

The column is less than or equal to the value

(required) value: The value that the column should be compared against

<int or float>

A numerical value

<string>

A string representing a datetime value

Example

Define your constraint using the parameters and then add it to a synthesizer.

my_constraint = {
    'constraint_class': 'ScalarInequality',
    'table_name': 'guests', # for multi table synthesizers
    'constraint_parameters': {
        'column_name': 'checkin_date',
        'relation': '>=',
        'value':  '01 Jan 2020'
    }
}

my_synthesizer.add_constraints(constraints=[
    my_constraint
])

FAQs

Shortcuts Available. If you want to enforce a lower bound of 0, use the Positive constraint. For an upper bound of 0, use the Negative constraint. If you want to enforce both upper and lower bounds, use the ScalarRange constraint.

Why am I getting a privacy warning?

Adding this constraint exposes the min or max value of the data. This may be private information. Here are a few questions to ask yourself:

  • Are the min and max values of the real data well-known, or did I discover them by looking at the real data?

  • Can someone use the min and max values to uncover other sensitive attributes?

  • Who do I plan to share the synthetic data with?

Always evaluate the privacy risk before sharing your synthetic data broadly.

What happens to missing values?

This constraint ignores missing values. The constraint considered is valid as long as the numerical values (non-missing values) follow the scalar inequality.

What if I don't have a single, fixed value to compare to?

Use the Inequality constraint to compare to values between two different columns.

Last updated

Copyright (c) 2023, DataCebo, Inc.