2.2 Business Rules (Constraints)

Does your data include business rules (constraints)?

In SDV, constraints refer to indirect relationships or business rules. Such rules are not present in your data schema, but critical for data validity. You may also referring to these are "derived rules" or "dependencies".

SDV Enterprise supports constraints via a low-code API in the CAG bundle. These constraints guide the model to generate 100% valid synthetic records.

Reviewing the CAG bundle

Review the full list of constraints (for both single and multi-table) and identify which constraints are relevant to your dataset.

Tip! Fake data can help you identify constraints. The fake data will violate constraints because it contains random combinations of values. Violated constraints would cause issues in testing and help you identify where constraints would need to be enforced when using data for the final POC.

For each constraint, mark down whether it's relevant to your dataset.

Constraint
Description
Is this relevant?

No shuffling is allowed other than what's already observed in the data. Eg. The city and country values cannot be shuffled to create new permutations.

<yes or no>

All the numerical values are increments of a whole number. Eg. All values in salary must be divisible by 1000

<yes or no>

The value in one column must always be greater than the other. Eg. The checkout_date must always be after the checkin_date

<yes or no>

The original data columns represent a one hot encoding scheme. Eg. Exactly 1 of the following columns has a 1 in each row: not_subscribed, basic_subscriber, premium

<yes or no>

The value in one column is bounded by the values in other columns. Eg. The parent_age must be in between child_age and grandparent_age

<yes or no>

A chain of 2 or more columns in an inequality. Eg. purchase_date < start_date < end_date < expiration_date < termination_date

<yes or no>

Multiple columns together form a primary key in a table. Eg. A combination of Patient ID and Date uniquely identify each record in a table.

<yes or no>

No shuffling is around for the missing values, other than what's already observed in the data. Eg. The city and country columns must both either be null together or not at all.

<yes or no>

The value of one categorical column determines the scale of another numerical column. Eg. If the value of test_type is 'blood_pressure' then the value of test_result must be within a reasonable for this test only.

<yes or no>

A column in the table refers to a different column in the same table. Eg. The Manager ID column refers to the Employee ID column in the same table.

<yes or no>

The same columns are present in a parent table and a child table, and the values of those columns have to match up according to the connection. Eg. The account Type in one table must match the corresponding account Type in another table.

<yes or no>

Multiple columns together form a primary key and foreign key connection. Eg. A combination of Patient ID and Date uniquely identify each record in a table.

<yes or no>

There are foreign keys in multiple tables but no primary key to attach them to. Eg. The Warehouse ID column in multiple tables is referring to the same concept.

<yes or no>

There is a 1-to-many connection between tables, but only certain values are allowed to have connections. Eg. Only accounts with Type=Premium are allowed to have children in another table.

<yes or no>

There is an exact 1-to-1 connection between the primary keys of two or more tables. Eg. There is an exact 1-to-1 relationship between table Users and table Supplemental Info

<yes or no>

There is a 1-to-1 connection between the primary keys of two or more tables but only certain values are allowed to have connections. Eg. Only users with Is Minor=True are allowed to have an entry in another table.

<yes or no>

A table acts as an unchangeable reference. You do not want to synthesize an new information in it. Eg. The City table should act as a reference; you do not want to synthesize new cities.

<yes or no>

A bridge table that records a many-to-many relationship between two other tabes, and the connections have to be unique Eg. The Author-Book table connects an author to a book — but the connection can only occur once.

<yes or no>

Last updated