Constraints

Do you have rules that every row in the data must follow? Are these the same regardless of how much data there is? You can use constraints to describe this business logic in your data.

Predefined Constraint Classes

Used predefined constraints to apply simple logic within a single table. For example, the value in one column (checkout_date) must always be greater than another (checkin_date).

Browse the predefined constraints to learn more.

Custom Business Logic

If your logic cannot be described by predefined constraints, create your own custom constraint. The logic must be defined in a separate Python file that you can load and add to any synthesizer.

See the Custom Business Logic guide for more details.

❖ Constraint Augmented Generation (CAG)

Ready to take constraints to the next level? Add complex business logic that reaches across multiple tables, and access advanced algorithms with a simple API.

See our guide for Constraint Augmented Generation.

FAQs

How is modeling & sampling performance impacted by constraints?

In most cases, the time it takes to fit the model and sample synthetic data should not be significantly affected. However, there are certain scenarios where you may notice a slow-down:

  • You have a large number of constraints that overlap. That is, multiple constraints are referencing the same columns of the data.

  • You are conditional sampling on the constrained columns. This requires some special processing and it may not always be possible to efficiently create conditional synthetic data.

For any questions or feature requests related to performance, please create an issue describing your data, constraints and sampling needs.

How does the SDV handle the constraints?

Under-the-hood, the SDV uses a combination of strategies to ensure that the synthetic data always follows the constraints. These strategies are:

  • Transformation: Most of the time, it's possible to transform the data in a way that guarantees the models will be able to learn the constraint. This is paired with a reverse transformation to ensure the synthetic data looks like the original.

  • Reject Sampling: Another strategy is to model and sample synthetic data as usual, and then throw away any rows in the synthetic data that violate the constraints.

  • Algorithmic Injection: Complex CAG patterns sometimes come with their own algorithms for ensuring robust and accurate modeling. These algorithms are compatible with any SDV synthesizer.

Last updated