Let's demonstrate adding custom business logic using a demo dataset.
from sdv.datasets.demo import download_demoreal_data, metadata =download_demo( modality='single_table', dataset_name='fake_hotel_guests')
This dataset contains information about various guests staying at a hotel. There is one, complex rule: Rewards members don't pay an amenities fee. That is, if has_rewards=True, then amenities_fee=0.
We will write general constraint logic called IfTrueThenZero, which ensures that if the value of a boolean column is True then the value in another numerical column must be set to 0.
Creating the Constraint
Validity Check
The validity check should return a series of True/False values that determine whether each row is valid.
Let's code the logic up using parameters:
column_names is the list of columns. We'll assume the first column is the boolean column (True/False) while the second column is the numerical column (that must be 0 if the boolean is True)
data is the full dataset
Custom parameters: We won't add any custom parameters for this constraint
defis_valid(column_names,data):# let's assume the first column name is the boolean column (has_rewards)# and the second column is the numerical column (amenities_fee) boolean_column = column_names[0] numerical_column = column_names[1]# if the first column is True, the second must be 0 true_values = (data[boolean_column]==True) & (data[numerical_column]==0.0)# if the first is False, then the second can be anything false_values = (data[boolean_column]==False)return (true_values) | (false_values)
Does my constraint have to be valid? The SDV expects that the all rows of your real data are valid. That is, calling is_valid on your real data should return a Series of only True values.
Transformations
The transformations must return the full datasets with specific columns transformed. We can modify, delete or add columns as long as we can reverse the transformation later.
For our function, we'll remove the 0 value whenever the boolean is True. This will allow the machine learning to learn the numerical distribution without these extra 0s.
deftransform(column_names,data):# let's assume the first column name is the boolean column (has_rewards)# and the second column is the numerical column (amenities_fee) boolean_column = column_names[0] numerical_column = column_names[1]# let's replace the 0 values with a typical value (median) typical_value = data[numerical_column].median() data[numerical_column]= data[numerical_column].mask(data[boolean_column] ==True, typical_value)return data
Reversing the transformation is easier. If the boolean column is True, we'll simply set the numerical column to 0.
defreverse_transform(column_names,data):# let's assume the first column name is the boolean column (has_rewards)# and the second column is the numerical column (amenities_fee) boolean_column = column_names[0] numerical_column = column_names[1]# set the numerical column to 0 if the boolean is True data[numerical_column]= data[numerical_column].mask(data[boolean_column] ==True, 0.0)return data
Putting it together
Finally, we can create our custom class by supplying these functions into the create_custom_constraint factory method. Since our constraint is similar to FixedIncrements, let's call it FixedIncrementsWithExclusion.
from sdv.constraints import create_custom_constraint_classIfTrueThenZero =create_custom_constraint_class( is_valid_fn=is_valid, transform_fn=transform, reverse_transform_fn=reverse_transform)
Download the Example
The Python file below includes the code above. You can download the file to inspect and test your custom constraint class.
Using the Constraint
In a separate Python file, you'll create a synthesizer and add the constraints to it. The synthesizer you use will have more information about how to use your custom constraint.
A general example for a single table synthesizer is shown below.
from sdv.single_table import GaussianCouplaSynthesizersynthesizer =GaussianCopulaSynthesizer(metadata)# load the constraint from the filesynthesizer.load_custom_constraint_classes( filepath='custom_constraint_example.py', class_names=['IfTrueThenZero'])# create constraints using the class# if has_rewards=True, the amenities_fee=0rewards_member_no_fee ={'constraint_class':'IfTrueThenZero','constraint_parameters':{'column_names': ['has_rewards','amenities_fee'],}}# apply the constraints to the synthesizersynthesizer.add_constaints([ rewards_member_no_fee])# now we can fit the model and create synthetic datasynthesizer.fit(real_data)synthesizer.sample(num_rows=10)