OneHotEncoding

The OneHotEncoding constraint enforces that a set of columns follow a one hot encoding scheme↗. That is, exactly one of the columns must contain a value of 1 while all the others must be 0.

Constraint API

Create a OneHotEncoding constraint.

Parameters:

  • (required) column_names: A list of column names that, together, form the one hot encoding scheme. The columns must be listed as numerical in your metadata.

  • table_name: A string with the name of the table to apply this to. Required if you have a multi-table dataset.

from sdv.cag import OneHotEncoding

my_constraint = OneHotEncoding(
    column_names=['status_active', 'status_inactive', 'status_on_hold']
)

Usage

Apply the constraint to any SDV synthesizer. Then fit and sample as usual.

synthesizer = GaussianCopulaSynthesizer(metadata)
synthesizer.add_constraints([my_constraint])

synthesizer.fit(data)
synthetic_data = synthesizer.sample()

For more information about using predefined constraints, please see the Constraint-Augmented Generation tutorial.

Last updated