❖ MixedScales

SDV Enterprise Bundle. This feature is available as part of the CAG Bundle, an optional add-on to SDV Enterprise. For more information, please visit the CAG Bundle page.

The MixedScales constraint enforces that the value of a categorical column (or a combination of categorical columns) determines the scale of a numerical column.

For example, the combined value of categorical columns test_type and units will determine the value of numerical column for test_results. So if test type is 'height' and units are 'inches' then it forces the synthesizer to learn the scale specially for this segment.

Constraint API

Create a MixedScales constraint.

Parameters

  • (required) segment_column_names: A list of one or more categorical columns that ultimately segment the data into different groups of rows. Each group will have a different scale.

  • (required) mixed_scale_column_name: A numerical column whose scale depends on the segment.

  • table_name: A string with the name of the table to apply this to. Required if you have a multi-table dataset.

from sdv.cag import MixedScales

my_constraint = MixedScales(
    segment_column_names=['test_type', 'units'],
    'mixed_scale_column_name'='test_results'
)

Usage

Apply the constraint to any SDV synthesizer. Then fit and sample as usual.

synthesizer = GaussianCopulaSynthesizer(metadata)
synthesizer.add_constraints([my_constraint])

synthesizer.fit(data)
synthetic_data = synthesizer.sample()

For more information about using predefined constraints, please see the Constraint-Augmented Generation tutorial.

FAQs

What if I only have one categorical column that determines the scale?

In this case, please provide a list of just one column name as your segment_column_names:

my_constraint = {
    'constraint_class': 'MixeScales',
    'table_name': 'patient_test_results', # for multi table synthesizers
    'constraint_parameters': {
        'segment_column_names': ['test_type'],
        'mixed_scale_column_name': 'test_results'
    }
}
Why can't I input the scale min and max for each segment?

In this constraint, you will only define the column names. SDV synthesizers will use this information to automatically learn the scales (min/max values and distribution) for each different segment of the real data. There is no need to manually input this data.

Last updated