Custom Logic
Last updated
Last updated
Copyright (c) 2023, DataCebo, Inc.
If the predefined constraint classes don't meet your needs, you can write your own custom business logic.
Compatibility: Any type of column except for primary and foreign keys
Custom constraints are a last resort. Adding a custom constraint requires you to specify and maintain your own logic. The SDV team does not offer debugging support to public users for their custom logic.
In many cases, it's possible to achieve your result more easily with existing SDV features:
Metadata. The SDV metadata supports many semantic data types such as emails, phone numbers and credit card numbers. When you specify these sdtypes, the SDV automatically creates valid data for them. For more info, see the Metadata Spec and sdtype definition.
Preprocessing. Tuning the pre- and post-processing leads to higher quality data. For more information, read about transformation for single and multi-table usages.
Predefined Constraints. The SDV team has created and tested predefined constraints so we recommend using them when possible. For more information, see predefined constraints.
If you have any questions, please reach out to us on Slack or GitHub and we'll be happy to point you in the right direction.
To write your custom logic, you'll need to include:
Validity Check: A test that determines whether the logic is valid for all rows of the data, and
(optional) Transformation Functions: Functions to modify the data before & after modeling
The SDV uses the functionality you provide to meet the constraint, as shown in the diagram below.
Should I provide transformation functions? What happens if I don't? Providing transformation functions is highly encouraged.
The SDV always attempts to transform and reverse transform your data. This is the most efficient way to ensuring that your constraint is met. If you do not provide this function (or if it crashes) then the SDV will fallback to only using the validity check.
The validity check and transformations must be implemented in a separate Python file. For example example_custom_constraint.py
. Make sure you always provide this file as an attachment.
Inside the file, define the validity, transformations and create the custom constraint class.
To check for validity, write a function with the the following signature.
Parameters
(required) column_names
: A list of column names to check the validity for. If your logic is defined only for a single column, you can use only the first element of the list.
(required) data
: A table of data, represented as a pandas DataFrame object
**kwargs
: Any other parameters that you need.
Output: A pandas Series object of True/False
values that specify whether each row is valid. There should be exactly 1 True/False
value for every row in the data.
Optionally, you can provide a transformation function that modifies the data. The modification should allow the model to learn the rule. It should be paired with complementary function that reverses the transformation.
Parameters
(required) column_names
: A list of column names to transform. Note that the column names must be present in the original data, but you can modify or delete them in the function.
(required) data
or transformed_data
: A table of data, represented as a pandas DataFrame object
**kwargs
: Any other parameters that you need. These should be the same as the validity function.
Output: A pandas DataFrame that represents the transformed version of the data.
Note that all functions accept the same signature. The **kwargs
can be defined however you want but all functions should use the same ones.
When you have your functions defined, use the create_custom_constraint_class
factory method to create your constraint class.
Parameters
(required) is_valid_fn
: The validity check
transform_fn
: The transformation function. If this is not provided, no transformation is applied.
reverse_transform_fn
: The reverse transformation function. This only required if you provided the transform function.
Output: A Python class for your constraint
Download the template file below to get started with creating your custom constraint class.
In a separate Python file, you'll create a synthesizer. There, you can load apply your custom logic. The synthesizer you use will have more information about how to use your custom constraint. A general example for a single table synthesizer is shown below.
Once you've loaded the file, you can create your custom constraint using the logic.
Parameters
When creating the constraints, you'll create a dictionary object with the constraint class and constraint parameters. The parameters are the column names and any extra parameters you've added.
(required) column_names
: A list of one or more column names involved in the constraint
<other parameters>
: Any other parameters you defined in your functions
Finally, you you'll need to (re)train your synthesizer. All synthetic data it produces will be valid for the constraint.