Synthetic Data Vault
GitHubSlackDataCebo
  • Welcome to the SDV!
  • Tutorials
  • Explore SDV
    • SDV Community
    • SDV Enterprise
      • ⭐Compare Features
    • SDV Bundles
      • ❖ AI Connectors
      • ❖ CAG
      • ❖ Differential Privacy
      • ❖ XSynthesizers
  • Single Table Data
    • Data Preparation
      • Loading Data
      • Creating Metadata
    • Modeling
      • Synthesizers
        • GaussianCopulaSynthesizer
        • CTGANSynthesizer
        • TVAESynthesizer
        • ❖ XGCSynthesizer
        • ❖ SegmentSynthesizer
        • * DayZSynthesizer
        • ❖ DPGCSynthesizer
        • ❖ DPGCFlexSynthesizer
        • CopulaGANSynthesizer
      • Customizations
        • Constraints
        • Preprocessing
    • Sampling
      • Sample Realistic Data
      • Conditional Sampling
    • Evaluation
      • Diagnostic
      • Data Quality
      • Visualization
  • Multi Table Data
    • Data Preparation
      • Loading Data
        • Demo Data
        • CSV
        • Excel
        • ❖ AlloyDB
        • ❖ BigQuery
        • ❖ MSSQL
        • ❖ Oracle
        • ❖ Spanner
      • Cleaning Your Data
      • Creating Metadata
    • Modeling
      • Synthesizers
        • * DayZSynthesizer
        • * IndependentSynthesizer
        • HMASynthesizer
        • * HSASynthesizer
      • Customizations
        • Constraints
        • Preprocessing
      • * Performance Estimates
    • Sampling
    • Evaluation
      • Diagnostic
      • Data Quality
      • Visualization
  • Sequential Data
    • Data Preparation
      • Loading Data
      • Cleaning Your Data
      • Creating Metadata
    • Modeling
      • PARSynthesizer
      • Customizations
    • Sampling
      • Sample Realistic Data
      • Conditional Sampling
    • Evaluation
  • Concepts
    • Metadata
      • Sdtypes
      • Metadata API
      • Metadata JSON
    • Constraints
      • Predefined Constraints
        • Positive
        • Negative
        • ScalarInequality
        • ScalarRange
        • FixedIncrements
        • FixedCombinations
        • ❖ FixedNullCombinations
        • ❖ MixedScales
        • OneHotEncoding
        • Inequality
        • Range
        • * ChainedInequality
      • Custom Logic
        • Example: IfTrueThenZero
      • ❖ Constraint Augmented Generation (CAG)
        • ❖ CarryOverColumns
        • ❖ CompositeKey
        • ❖ ForeignToForeignKey
        • ❖ ForeignToPrimaryKeySubset
        • ❖ PrimaryToPrimaryKey
        • ❖ PrimaryToPrimaryKeySubset
        • ❖ SelfReferentialHierarchy
        • ❖ ReferenceTable
        • ❖ UniqueBridgeTable
  • Support
    • Troubleshooting
      • Help with Installation
      • Help with SDV
    • Versioning & Backwards Compatibility Policy
Powered by GitBook

Copyright (c) 2023, DataCebo, Inc.

On this page
  • Parameters
  • Example
  • FAQs
  1. Concepts
  2. Constraints
  3. Predefined Constraints

❖ MixedScales

Previous❖ FixedNullCombinationsNextOneHotEncoding

Last updated 27 days ago

Compatibility: 1 or more columns that are categorical and 1 column that is numerical

The MixedScales constraint enforces that the value of a categorical column (or a combination of categorical columns) determines the scale of a numerical column. For example, the combined value of categorical columns test_type and units will determine the value of numerical column for test_results. So if test type is 'height' and units are 'inches' then it forces the synthesizer to learn the scale specially for this segment.

Parameters

(required) segment_column_names: A list of one or more categorical columns that ultimately segment the data into different groups of rows. Each group will have a different scale.

(required) mixed_scale_column_name: A numerical column whose scale depends on the segment.

Example

Define your constraint using the parameters and then add it to a synthesizer.

my_constraint = {
    'constraint_class': 'MixedScales',
    'table_name': 'patient_test_results', # for multi table synthesizers
    'constraint_parameters': {
        'segment_column_names': ['test_type', 'units'],
        'mixed_scale_column_name': 'test_results'
    }
}

my_synthesizer.add_constraints(constraints=[
    my_constraint
])

FAQs

What if I only have one categorical column that determines the scale?

In this case, please provide a list of just one column name as your segment_column_names:

my_constraint = {
    'constraint_class': 'MixeScales',
    'table_name': 'patient_test_results', # for multi table synthesizers
    'constraint_parameters': {
        'segment_column_names': ['test_type'],
        'mixed_scale_column_name': 'test_results'
    }
}
Why can't I input the scale min and max for each segment?

In this constraint, you will only define the column names. SDV synthesizers will use this information to automatically learn the scales (min/max values and distribution) for each different segment of the real data. There is no need to manually input this data.

❖ SDV Enterprise Bundle. This feature is available as part of the CAG Bundle, an optional add-on to SDV Enterprise. For more information, please visit the page.

CAG Bundle