2.2 Business Rules (Constraints)
Does your data include business rules (constraints)?
In SDV, constraints refer to indirect relationships or business rules. Such rules are not present in your data schema, but critical for data validity. You may also referring to these are "derived rules" or "dependencies".
SDV Enterprise supports constraints via a low-code API in the CAG bundle. These constraints guide the model to generate 100% valid synthetic records.
Reviewing the CAG bundle
Review the full list of constraints (for both single and multi-table) and identify which constraints are relevant to your dataset.
For each constraint, mark down whether it's relevant to your dataset.
No shuffling is allowed other than what's already observed in the data.
Eg. The city
and country
values cannot be shuffled to create new permutations.
<yes or no>
All the numerical values are increments of a whole number.
Eg. All values in salary
must be divisible by 1000
<yes or no>
The value in one column must always be greater than the other.
Eg. The checkout_date
must always be after the checkin_date
<yes or no>
The original data columns represent a one hot encoding scheme.
Eg. Exactly 1 of the following columns has a 1
in each row: not_subscribed
, basic_subscriber
, premium
<yes or no>
The value in one column is bounded by the values in other columns.
Eg. The parent_age
must be in between child_age
and grandparent_age
<yes or no>
A chain of 2 or more columns in an inequality.
Eg. purchase_date
< start_date
< end_date
< expiration_date
< termination_date
<yes or no>
Multiple columns together form a primary key in a table.
Eg. A combination of Patient ID
and Date
uniquely identify each record in a table.
<yes or no>
No shuffling is around for the missing values, other than what's already observed in the data.
Eg. The city
and country
columns must both either be null together or not at all.
<yes or no>
The value of one categorical column determines the scale of another numerical column.
Eg. If the value of test_type
is 'blood_pressure'
then the value of test_result
must be within a reasonable for this test only.
<yes or no>
A column in the table refers to a different column in the same table.
Eg. The Manager ID
column refers to the Employee ID
column in the same table.
<yes or no>
The same columns are present in a parent table and a child table, and the values of those columns have to match up according to the connection.
Eg. The account Type
in one table must match the corresponding account Type
in another table.
<yes or no>
Multiple columns together form a primary key and foreign key connection.
Eg. A combination of Patient ID
and Date
uniquely identify each record in a table.
<yes or no>
There are foreign keys in multiple tables but no primary key to attach them to.
Eg. The Warehouse ID
column in multiple tables is referring to the same concept.
<yes or no>
There is a 1-to-many connection between tables, but only certain values are allowed to have connections.
Eg. Only accounts with Type=Premium
are allowed to have children in another table.
<yes or no>
There is an exact 1-to-1 connection between the primary keys of two or more tables.
Eg. There is an exact 1-to-1 relationship between table Users
and table Supplemental Info
<yes or no>
There is a 1-to-1 connection between the primary keys of two or more tables but only certain values are allowed to have connections.
Eg. Only users with Is Minor=True
are allowed to have an entry in another table.
<yes or no>
A table acts as an unchangeable reference. You do not want to synthesize an new information in it.
Eg. The City
table should act as a reference; you do not want to synthesize new cities.
<yes or no>
A bridge table that records a many-to-many relationship between two other tabes, and the connections have to be unique
Eg. The Author-Book
table connects an author to a book — but the connection can only occur once.
<yes or no>
Last updated