Deprecated Constraints

If you were using an older version of SDV, you may have noticed that we used to support a few additional constraints. This includes:

  • Constraints that used to enforced min/max boundaries on your data: Positive, Negative, ScalarInequality, and ScalarRange.

  • Constraints that used to denote relationships between your data tables: CompositeKeys, PrimaryToPrimaryKey, and UniqueBridgeTable

These constraints are no longer supported in the CAG framework because the same functionality can be achieved using other features within the SDV.

Browse below for the equivalent functionality that we recommend using instead of the deprecated constraints.

Enforcing min/max boundaries:

chevron-rightI would like my synthetic data to remain in the same min/max range as my original datahashtag

By default, SDV synthesizers learn the min/max values of your data, and ensure the synthetic data adheres to the same ranges. So there is nothing more you need to do to achieve this.

chevron-rightI would like to perform statistical preprocessing on my data (such as the log or logit function)hashtag

The log or logit functions mathematically impose min or max boundaries on your data. To use these functions, we recommend disabling the synthesizer's min/max enforcement. Then, you can add statistical preprocessing for individual columns.

# disable the synthesizer's min/max enforcement
my_synthesizer = GaussianCopulaSynthesizer(
    metadata,
    enforce_min_max_values=False)

# add preprocessing for individual columns such as 'age' and 'salary'
my_synthesizer.auto_assign_transformers(my_data)

from rdt.transformers.numerical import LogScaler, LogitScaler

my_synthesizer.update_transformers({
    'age': LogScaler()
    'salary': LogitScaler(
        min_value=0 - 1e-10,
        max_value=1000000
    )
})

For more information about custom preprocessing, please see our tutorialarrow-up-right and API docsarrow-up-right. Also see the docs the LogScalerarrow-up-right and LogitScalerarrow-up-right.

chevron-rightI would like to control the range of my synthetic data independently of what's in my original datahashtag

The SDV software is primarily meant to learn ranges from the original data and replicate those same patterns as-is in the real data. However, we offer a few features that can help you control the ranges:

  • Further restricting the range of the synthetic data: We recommend letting your synthesizer learn the original ranges. Then, use conditional samplingarrow-up-right to get the exact values of synthetic data that you need.

# create and fit an SDV synthesizer with the default settings
my_synthesizer = GaussianCopulaSynthesizer(metadata)
my_synthesizer.fit(my_data)

# use conditional sampling to limit the synthetic data
# in this case, we'll only create ages that are 23-25
from sdv.sampling import Condition

age_23 = Condition(num_rows=4, column_values={'age': 23})
age_24 = Condition(num_rows=4, column_values={'age': 24})
age_25 = Condition(num_rows=4, column_values={'age': 25})

synthetic_data = my_synthesizer.sample_from_conditions([age_23, age_24, age_25])
  • Expanding the range of the synthetic data: Expanding the synthetic data range (beyond the the original data) is not yet fully supported within the SDV. The best approach would be to include additional rows in the original data to encompass the full range. That way, SDV can learn the full range. If that is not possible, you can try disabling the synthesizer's min/max range enforcement. Depending on the synthesizer's algorithm, you may be able to achieve a slightly broader range this way (no guarantees).

# disable the synthesizer's min/max enforcement
my_synthesizer = CTGANSynthesizer(
    metadata,
    enforce_min_max_values=False)
my_synthesizer.fit(my_data)

# now the synthetic data may go outside the ranges
# but this is not yet fully supported
synthetic_data = my_synthesizer.sample(num_rows=500)

Denoting relationships in your data:

chevron-rightMy schema contains composite keys for the primary/foreign key connections.hashtag

You can now supply composite primary and foreign keys directly in your metadata by providing a list instead of an individual column name. For example, a table may have a composite primary key that comprises of two columns, Patient ID and Date; both columns together act as the primary key.

You can set both columns as the primary key in the metadata.

You can also supply a composite foreign key that points to a composite primary key. Make sure that the foreign key has the same number of columns and that the order of the columns you provide line up with the values they are referencing.

For more information, see the Metadata Guide.

chevron-rightI want to enforce uniqueness in a bridge tablehashtag

A bridge table that records a many-to-many relationship between multiple tables. To enforce that the connections are unique, you can denote a composite key for the bridge table in your metadata.

For example let's say you have a bridge table Author-Book that connects the Authors and Books table.

To enforce that the connections are unique, set a composite key for the Author-Book table.

The individual columns in the bridge table can still be foreign keys into other tables.

For more information, see the Metadata Guide.

chevron-rightMy schema contains two connected primary keyshashtag

If your schema contains a relationship between two primary keys, that means you have a 1-to-1 connection between two tables. You can denote this in your metadata by adding a relationship. In the relationship, one of the primary key columns acts as a foreign key to another table. Typically the foreign key will be the table that contains extra or auxiliary information.

For example, you may have a table for Users and another table for their Supplemental Info that is connected in a 1-to-1 fashion.

In this case, we add a foreign key reference from the Supplemental Info table (auxiliary table) to the main Users table.

For more information, see the Metadata Guide.

Older Functionality

If you are using an older version of the SDV (version 1.21 or earlier), we've provided a PDF below containing the deprecated constraints API, as well as the API for custom constraints. However, please note that we recommend upgrading to the latest version of SDV as soon as you can. The latest SDV version contains bug fixes and additional features that you may need. Our team is only able to provide questions/debugging support for the latest SDV version.

circle-check

Last updated