Deprecated Constraints
If you were using an older version of SDV, you may have noticed that we used to support a few additional constraints. This includes:
Constraints that used to enforced min/max boundaries on your data: Positive, Negative, ScalarInequality, and ScalarRange.
Constraints that used to denote relationships between your data tables: CompositeKeys, PrimaryToPrimaryKey, and UniqueBridgeTable
These constraints are no longer supported in the CAG framework because the same functionality can be achieved using other features within the SDV.
Recommended Usage
Browse below for the equivalent functionality that we recommend using instead of the deprecated constraints.
Enforcing min/max boundaries:
I would like my synthetic data to remain in the same min/max range as my original data
By default, SDV synthesizers learn the min/max values of your data, and ensure the synthetic data adheres to the same ranges. So there is nothing more you need to do to achieve this.
I would like to perform statistical preprocessing on my data (such as the log or logit function)
The log or logit functions mathematically impose min or max boundaries on your data. To use these functions, we recommend disabling the synthesizer's min/max enforcement. Then, you can add statistical preprocessing for individual columns.
# disable the synthesizer's min/max enforcement
my_synthesizer = GaussianCopulaSynthesizer(
metadata,
enforce_min_max_values=False)
# add preprocessing for individual columns such as 'age' and 'salary'
my_synthesizer.auto_assign_transformers(my_data)
from rdt.transformers.numerical import LogScaler, LogitScaler
my_synthesizer.update_transformers({
'age': LogScaler()
'salary': LogitScaler(
min_value=0 - 1e-10,
max_value=1000000
)
})For more information about custom preprocessing, please see our tutorial and API docs. Also see the docs the LogScaler and LogitScaler.
I would like to control the range of my synthetic data independently of what's in my original data
The SDV software is primarily meant to learn ranges from the original data and replicate those same patterns as-is in the real data. However, we offer a few features that can help you control the ranges:
Further restricting the range of the synthetic data: We recommend letting your synthesizer learn the original ranges. Then, use conditional sampling to get the exact values of synthetic data that you need.
# create and fit an SDV synthesizer with the default settings
my_synthesizer = GaussianCopulaSynthesizer(metadata)
my_synthesizer.fit(my_data)
# use conditional sampling to limit the synthetic data
# in this case, we'll only create ages that are 23-25
from sdv.sampling import Condition
age_23 = Condition(num_rows=4, column_values={'age': 23})
age_24 = Condition(num_rows=4, column_values={'age': 24})
age_25 = Condition(num_rows=4, column_values={'age': 25})
synthetic_data = my_synthesizer.sample_from_conditions([age_23, age_24, age_25])Expanding the range of the synthetic data: Expanding the synthetic data range (beyond the the original data) is not yet fully supported within the SDV. The best approach would be to include additional rows in the original data to encompass the full range. That way, SDV can learn the full range. If that is not possible, you can try disabling the synthesizer's min/max range enforcement. Depending on the synthesizer's algorithm, you may be able to achieve a slightly broader range this way (no guarantees).
# disable the synthesizer's min/max enforcement
my_synthesizer = CTGANSynthesizer(
metadata,
enforce_min_max_values=False)
my_synthesizer.fit(my_data)
# now the synthetic data may go outside the ranges
# but this is not yet fully supported
synthetic_data = my_synthesizer.sample(num_rows=500)Denoting relationships in your data:
My schema contains composite keys for the primary/foreign key connections.
You can now supply composite primary and foreign keys directly in your metadata by providing a list instead of an individual column name. For example, a table may have a composite primary key that comprises of two columns, Patient ID and Date; both columns together act as the primary key.

You can set both columns as the primary key in the metadata.
You can also supply a composite foreign key that points to a composite primary key. Make sure that the foreign key has the same number of columns and that the order of the columns you provide line up with the values they are referencing.
For more information, see the Metadata Guide.
I want to enforce uniqueness in a bridge table
A bridge table that records a many-to-many relationship between multiple tables. To enforce that the connections are unique, you can denote a composite key for the bridge table in your metadata.
For example let's say you have a bridge table Author-Book that connects the Authors and Books table.

To enforce that the connections are unique, set a composite key for the Author-Book table.
The individual columns in the bridge table can still be foreign keys into other tables.
For more information, see the Metadata Guide.
My schema contains two connected primary keys
If your schema contains a relationship between two primary keys, that means you have a 1-to-1 connection between two tables. You can denote this in your metadata by adding a relationship. In the relationship, one of the primary key columns acts as a foreign key to another table. Typically the foreign key will be the table that contains extra or auxiliary information.
For example, you may have a table for Users and another table for their Supplemental Info that is connected in a 1-to-1 fashion.

In this case, we add a foreign key reference from the Supplemental Info table (auxiliary table) to the main Users table.
For more information, see the Metadata Guide.
Older Functionality
If you are using an older version of the SDV (version 1.21 or earlier), we've provided a PDF below containing the deprecated constraints API, as well as the API for custom constraints. However, please note that we recommend upgrading to the latest version of SDV as soon as you can. The latest SDV version contains bug fixes and additional features that you may need. Our team is only able to provide questions/debugging support for the latest SDV version.
We're here to help you upgrade. If you need any help, please contact us via our forum.
Last updated