OneHotEncoding
The OneHotEncoding constraint enforces that a set of columns follow a one hot encoding scheme↗. That is, exactly one of the columns must contain a value of 1
while all the others must be 0
.
Constraint API
Create a OneHotEncoding
constraint.
Parameters:
(required)
column_names
: A list of column names that, together, form the one hot encoding scheme. The columns must be listed as numerical in your metadata.table_name
: A string with the name of the table to apply this to. Required if you have a multi-table dataset.
from sdv.cag import OneHotEncoding
my_constraint = OneHotEncoding(
column_names=['status_active', 'status_inactive', 'status_on_hold']
)
Usage
Apply the constraint to any SDV synthesizer. Then fit and sample as usual.
synthesizer = GaussianCopulaSynthesizer(metadata)
synthesizer.add_constraints([my_constraint])
synthesizer.fit(data)
synthetic_data = synthesizer.sample()
For more information about using predefined constraints, please see the Constraint-Augmented Generation tutorial.
Last updated