OneHotEncoding
The OneHotEncoding constraint enforces that a set of columns follow a one hot encoding scheme↗. That is, exactly one of the columns must contain a value of 1
while all the others must be 0
.
Constraint API
Create a OneHotEncoding
constraint.
Parameters:
(required)
column_names
: A list of column names that, together, form the one hot encoding scheme. The columns must be listed as numerical in your metadata.table_name
: A string with the name of the table to apply this to. Required if you have a multi-table dataset.learning_strategy
: A string that controls how SDV should ultimately enforce the constraint internally. No matter which strategy you chose, you are guaranteed to have synthetic, one-hot encoded columns. Options:(default)
'one_hot'
: SDV will keep the one hot encoded columns for the underlying ML model to learn. When creating synthetic data, SDV will enforce that only 1 of the columns will be picked.'categorical'
: SDV will collapse the one hot encoded columns into a single, categorical column for the underlying ML model to learn. When creating synthetic data, SDV will expand the category value back into the original set of one hot encoded columns.
from sdv.cag import OneHotEncoding
my_constraint = OneHotEncoding(
column_names=['status_active', 'status_inactive', 'status_on_hold'],
learning_strategy='one_hot'
)
Usage
Apply the constraint to any SDV synthesizer. Then fit and sample as usual.
synthesizer = GaussianCopulaSynthesizer(metadata)
synthesizer.add_constraints([my_constraint])
synthesizer.fit(data)
synthetic_data = synthesizer.sample()
For more information about using predefined constraints, please see the Constraint-Augmented Generation tutorial.
FAQs
Last updated