Constraint Logic
Last updated
Last updated
Do you have rules that every row in the data must follow? Are these the same regardless of how much data there is? You can use constraints to describe this business logic in your metadata.
The SDV has 9 predefined constraint classes that are commonly used in enterprise. For example, when the value in one column must always be greater than another, use the Inequality
constraint.
to learn more.
If your dataset includes business logic that cannot be covered by the predefined constraints, then you can create your own custom constraint. The logic must be defined in a separate Python file that you can load.
Then, you can create a custom constraint just like a predefined constraint.
Do you need constraints? Before adding a constraint to your model, carefully consider whether it is necessary. Here are a few questions to ask:
How do I plan to use the synthetic data? Without the constraint, the rule may still be valid a majority of the time. Only add the constraint if you require 100% adherence.
Who do I plan to share the synthetic data with? Consider whether they will be able to use the business rule to uncover sensitive information about the real data.
How did the rule come to be? In some cases, there may be other data sources that are present without extra columns and rules.
In the ideal case, there are only a handful constraints you are applying to your model.
See the guide for more details.
Your categorical data has a high cardinality. For example, you have a categorical column with hundreds of possible categories that you are using in a constraint.
For any questions or feature requests related to performance, please describing your data, constraints and sampling needs.