Example: IfTrueThenZero

Let's demonstrate adding custom business logic using a demo dataset.

from sdv.datasets.demo import download_demo

real_data, metadata = download_demo(
    modality='single_table',
    dataset_name='fake_hotel_guests'
)

This dataset contains information about various guests staying at a hotel. There is one, complex rule: Rewards members don't pay an amenities fee. That is, if has_rewards=True, then amenities_fee=0.

We will write general constraint logic called IfTrueThenZero, which ensures that if the value of a boolean column is True then the value in another numerical column must be set to 0.

Creating the Constraint

Validity Check

The validity check should return a series of True/False values that determine whether each row is valid.

Let's code the logic up using parameters:

  • column_names is the list of columns. We'll assume the first column is the boolean column (True/False) while the second column is the numerical column (that must be 0 if the boolean is True)

  • data is the full dataset

  • Custom parameters: We won't add any custom parameters for this constraint

def is_valid(column_names, data):
  # let's assume the first column name is the boolean column (has_rewards)
  # and the second column is the numerical column (amenities_fee)
  boolean_column = column_names[0]
  numerical_column = column_names[1]

  # if the first column is True, the second must be 0
  true_values = (data[boolean_column] == True) & (data[numerical_column] == 0.0)
  
  # if the first is False, then the second can be anything
  false_values = (data[boolean_column] == False)

  return (true_values) | (false_values)

Does my constraint have to be valid? The SDV expects that the all rows of your real data are valid. That is, calling is_valid on your real data should return a Series of only True values.

Transformations

The transformations must return the full datasets with specific columns transformed. We can modify, delete or add columns as long as we can reverse the transformation later.

For our function, we'll remove the 0 value whenever the boolean is True. This will allow the machine learning to learn the numerical distribution without these extra 0s.

def transform(column_names, data):
  # let's assume the first column name is the boolean column (has_rewards)
  # and the second column is the numerical column (amenities_fee)
  boolean_column = column_names[0]
  numerical_column = column_names[1]

  # let's replace the 0 values with a typical value (median)
  typical_value = data[numerical_column].median()
  data[numerical_column] = data[numerical_column].mask(data[boolean_column] == True, typical_value)
  
  return data

Reversing the transformation is easier. If the boolean column is True, we'll simply set the numerical column to 0.

def reverse_transform(column_names, data):
  # let's assume the first column name is the boolean column (has_rewards)
  # and the second column is the numerical column (amenities_fee)
  boolean_column = column_names[0]
  numerical_column = column_names[1]

  # set the numerical column to 0 if the boolean is True
  data[numerical_column] = data[numerical_column].mask(data[boolean_column] == True, 0.0)
  
  return data

Putting it together

Finally, we can create our custom class by supplying these functions into the create_custom_constraint factory method. Since our constraint is similar to FixedIncrements, let's call it FixedIncrementsWithExclusion.

from sdv.constraints import create_custom_constraint_class

IfTrueThenZero = create_custom_constraint_class(
    is_valid_fn=is_valid,
    transform_fn=transform,
    reverse_transform_fn=reverse_transform
)

Download the Example

The Python file below includes the code above. You can download the file to inspect and test your custom constraint class.

Using the Constraint

In a separate Python file, you'll create a synthesizer and add the constraints to it. The synthesizer you use will have more information about how to use your custom constraint.

A general example for a single table synthesizer is shown below.

from sdv.single_table import GaussianCouplaSynthesizer

synthesizer = GaussianCopulaSynthesizer(metadata)

# load the constraint from the file
synthesizer.load_custom_constraint_classes(
    filepath='custom_constraint_example.py',
    class_names=['IfTrueThenZero']
)

# create constraints using the class

# if has_rewards=True, the amenities_fee=0
rewards_member_no_fee = {
    'constraint_class': 'IfTrueThenZero',
    'constraint_parameters': {
        'column_names': ['has_rewards', 'amenities_fee'],
    }
}

# apply the constraints to the synthesizer
synthesizer.add_constaints([
    rewards_member_no_fee
])

# now we can fit the model and create synthetic data
synthesizer.fit(real_data)
synthesizer.sample(num_rows=10)

Last updated

Copyright (c) 2023, DataCebo, Inc.