Custom Logic

If the predefined constraint classes don't meet your needs, you can write your own custom business logic.

Compatibility: Any type of column except for primary and foreign keys

Custom constraints are a last resort. Adding a custom constraint requires you to specify and maintain your own logic. The SDV team does not offer debugging support to public users for their custom logic.

In many cases, it's possible to achieve your result more easily with existing SDV features:

  • Metadata. The SDV metadata supports many semantic data types such as emails, phone numbers and credit card numbers. When you specify these sdtypes, the SDV automatically creates valid data for them. For more info, see the Metadata Spec and sdtype definition.

  • Preprocessing. Tuning the pre- and post-processing leads to higher quality data. For more information, read about transformation for single and multi-table usages.

  • Predefined Constraints. The SDV team has created and tested predefined constraints so we recommend using them when possible. For more information, see predefined constraints.

If you have any questions, please reach out to us on Slack or GitHub and we'll be happy to point you in the right direction.

To write your custom logic, you'll need to include:

  • Validity Check: A test that determines whether the logic is valid for all rows of the data, and

  • (optional) Transformation Functions: Functions to modify the data before & after modeling

The SDV uses the functionality you provide to meet the constraint, as shown in the diagram below.

Should I provide transformation functions? What happens if I don't? Providing transformation functions is highly encouraged.

The SDV always attempts to transform and reverse transform your data. This is the most efficient way to ensuring that your constraint is met. If you do not provide this function (or if it crashes) then the SDV will fallback to only using the validity check.

Defining your custom logic

The validity check and transformations must be implemented in a separate Python file. For example example_custom_constraint.py. Make sure you always provide this file as an attachment.

Inside the file, define the validity, transformations and create the custom constraint class.

Validity Check

To check for validity, write a function with the the following signature.

Parameters

  • (required) column_names: A list of column names to check the validity for. If your logic is defined only for a single column, you can use only the first element of the list.

  • (required) data: A table of data, represented as a pandas DataFrame object

  • **kwargs: Any other parameters that you need.

Output: A pandas Series object of True/False values that specify whether each row is valid. There should be exactly 1 True/False value for every row in the data.

import pandas as pd

def is_valid(column_names, data, extra_parameter):
    # replace with your custom logic
    validity = [True]*data.shape[0]
    return pd.Series(validity)

Optionally, you can provide a transformation function that modifies the data. The modification should allow the model to learn the rule. It should be paired with complementary function that reverses the transformation.

Parameters

  • (required) column_names: A list of column names to transform. Note that the column names must be present in the original data, but you can modify or delete them in the function.

  • (required) data or transformed_data: A table of data, represented as a pandas DataFrame object

  • **kwargs: Any other parameters that you need. These should be the same as the validity function.

Output: A pandas DataFrame that represents the transformed version of the data.

def transform(column_names, data, extra_parameter):
    # replace with your custom logic
    transformed_data = data.copy()
    return transformed_data
    
def reverse_transform(column_names, transformed_data, custom_parameter):
    # replace with your custom logic
    reversed_data = transformed_data.copy()
    return reversed_data

Note that all functions accept the same signature. The **kwargs can be defined however you want but all functions should use the same ones.

Putting it together

When you have your functions defined, use the create_custom_constraint_class factory method to create your constraint class.

Parameters

  • (required) is_valid_fn: The validity check

  • transform_fn: The transformation function. If this is not provided, no transformation is applied.

  • reverse_transform_fn: The reverse transformation function. This only required if you provided the transform function.

Output: A Python class for your constraint

from sdv.constraints import create_custom_constraint_class

MyCustomConstraintClass = create_custom_constraint_class(
    is_valid_fn=is_valid,
    transform_fn=transform,
    reverse_transform_fn=reverse_transform
)

Get Started Now

Download the template file below to get started with creating your custom constraint class.

Using your custom constraint

In a separate Python file, you'll create a synthesizer. There, you can load apply your custom logic. The synthesizer you use will have more information about how to use your custom constraint. A general example for a single table synthesizer is shown below.

Loading the custom logic file

# load the constraint from the file
synthesizer.load_custom_constraint_classes(
    filepath='custom_constraint_template.py',
    class_names=['MyCustomConstraintClass']
)

Creating your custom constraint

Once you've loaded the file, you can create your custom constraint using the logic.

Parameters

When creating the constraints, you'll create a dictionary object with the constraint class and constraint parameters. The parameters are the column names and any extra parameters you've added.

{
    'constraint_class': 'MyCustomConstraintClass',
    'constraint_parameters': {
        'column_names': ['column_A', 'column_B'],
        'extra_parameter': 10.00
    }
}
  • (required) column_names: A list of one or more column names involved in the constraint

  • <other parameters>: Any other parameters you defined in your functions

Training your synthesizer

Finally, you you'll need to (re)train your synthesizer. All synthetic data it produces will be valid for the constraint.

synthesizer.fit(data)
synthetic_data = synthesizer.sample(10)

FAQs

Which types of columns can I use within a custom constraint?

You are free to use any type of column for a custom constraint except for primary and foreign keys.

Primary and foreign keys pay a special role in identifying rows and relationships. Their original values are needed for all our synthesizers to work — especially multi-table synthesizers that learn patterns between the relationships.

The constraint crashes. How should I fix the logic?

You can debug your custom constraint by running the validity and transformation functions yourself on the real data. Assume that is_valid, transform and reverse_transform are the three functions you wrote. Test them out using your real data using the code snippet below.

validity_results = is_valid(real_data)
if not validity_results.all():
    print('Some rows are not valid in the real data')
    
transformed_data = transform(real_data)
print(transformed_data.head())

reversed_data = reverse_transform(transformed_data)
print(reversed_data.head())

Ensuring that there are no crashes and printing out these results may help you understand where the logic is failing.

Unfortunately, the SDV team is unable to offer individualized customer constraint support to public SDV users. For SDV Enterprise users, we offer help with debugging and may prioritize creating a new, predefined constraint related to your logic. To learn more about the SDV Enterprise features and purchasing a license, get in touch with us.

How can I improve the data quality or performance of the constraint?

Both low quality data and a slow performance usually indicate that the constraint is using the is_valid function only to filter out invalid rows. We strongly recommend adding a transform and reverse_transform functions to produce the highest quality data with fast performance.

Unfortunately, the SDV team is unable to offer individualized customer constraint support to public SDV users. For SDV Enterprise users, we offer help with debugging and may prioritize creating a new, predefined constraint related to your logic. To learn more about the SDV Enterprise features and purchasing a license, get in touch with us.

Last updated

Copyright (c) 2023, DataCebo, Inc.