Synthetic Data Vault
GitHubSlackDataCebo
  • Welcome to the SDV!
  • Tutorials
  • Explore SDV
    • SDV Community
    • SDV Enterprise
      • ⭐Compare Features
    • SDV Bundles
      • ❖ AI Connectors
      • ❖ CAG
      • ❖ Differential Privacy
      • ❖ XSynthesizers
  • Single Table Data
    • Data Preparation
      • Loading Data
      • Creating Metadata
    • Modeling
      • Synthesizers
        • GaussianCopulaSynthesizer
        • CTGANSynthesizer
        • TVAESynthesizer
        • ❖ XGCSynthesizer
        • ❖ BootstrapSynthesizer
        • ❖ SegmentSynthesizer
        • * DayZSynthesizer
        • ❖ DPGCSynthesizer
        • ❖ DPGCFlexSynthesizer
        • CopulaGANSynthesizer
      • Customizations
        • Constraints
        • Preprocessing
    • Sampling
      • Sample Realistic Data
      • Conditional Sampling
    • Evaluation
      • Diagnostic
      • Data Quality
      • Visualization
      • Privacy
        • Empirical Differential Privacy
        • SDMetrics: Privacy Metrics
  • Multi Table Data
    • Data Preparation
      • Loading Data
        • Demo Data
        • CSV
        • Excel
        • ❖ AlloyDB
        • ❖ BigQuery
        • ❖ MSSQL
        • ❖ Oracle
        • ❖ Spanner
      • Cleaning Your Data
      • Creating Metadata
    • Modeling
      • Synthesizers
        • * DayZSynthesizer
        • * IndependentSynthesizer
        • HMASynthesizer
        • * HSASynthesizer
      • Customizations
        • Constraints
        • Preprocessing
      • * Performance Estimates
    • Sampling
    • Evaluation
      • Diagnostic
      • Data Quality
      • Visualization
  • Sequential Data
    • Data Preparation
      • Loading Data
      • Cleaning Your Data
      • Creating Metadata
    • Modeling
      • PARSynthesizer
      • Customizations
    • Sampling
      • Sample Realistic Data
      • Conditional Sampling
    • Evaluation
  • Concepts
    • Metadata
      • Sdtypes
      • Metadata API
      • Metadata JSON
    • Constraint-Augmented Generation (CAG)
      • Predefined Constraints
        • FixedCombinations
        • FixedIncrements
        • Inequality
        • OneHotEncoding
        • Range
        • ❖ CarryOverColumns
        • * ChainedInequality
        • ❖ CompositeKey
        • ❖ FixedNullCombinations
        • ❖ ForeignToForeignKey
        • ❖ ForeignToPrimaryKeySubset
        • ❖ MixedScales
        • ❖ PrimaryToPrimaryKey
        • ❖ PrimaryToPrimaryKeySubset
        • ❖ ReferenceTable
        • ❖ SelfReferentialHierarchy
        • ❖ UniqueBridgeTable
      • Program Your Own Constraint
      • Constraints API
  • Support
    • Troubleshooting
      • Help with Installation
      • Help with SDV
    • Versioning & Backwards Compatibility Policy
Powered by GitBook

Copyright (c) 2023, DataCebo, Inc.

On this page
  • Constraint API
  • Usage
  • FAQs
  1. Concepts
  2. Constraint-Augmented Generation (CAG)
  3. Predefined Constraints

❖ MixedScales

Previous❖ ForeignToPrimaryKeySubsetNext❖ PrimaryToPrimaryKey

Last updated 1 day ago

The MixedScales constraint enforces that the value of a categorical column (or a combination of categorical columns) determines the scale of a numerical column.

Constraint API

Create a MixedScales constraint.

Parameters

  • (required) segment_column_names: A list of one or more categorical columns that ultimately segment the data into different groups of rows. Each group will have a different scale.

  • (required) mixed_scale_column_name: A numerical column whose scale depends on the segment.

  • table_name: A string with the name of the table to apply this to. Required if you have a multi-table dataset.

from sdv.cag import MixedScales

my_constraint = MixedScales(
    segment_column_names=['test_type', 'units'],
    'mixed_scale_column_name'='test_results'
)

Usage

Apply the constraint to any SDV synthesizer. Then fit and sample as usual.

synthesizer = GaussianCopulaSynthesizer(metadata)
synthesizer.add_constraints([my_constraint])

synthesizer.fit(data)
synthetic_data = synthesizer.sample()

FAQs

What if I only have one categorical column that determines the scale?

In this case, please provide a list of just one column name as your segment_column_names:

my_constraint = {
    'constraint_class': 'MixeScales',
    'table_name': 'patient_test_results', # for multi table synthesizers
    'constraint_parameters': {
        'segment_column_names': ['test_type'],
        'mixed_scale_column_name': 'test_results'
    }
}
Why can't I input the scale min and max for each segment?

In this constraint, you will only define the column names. SDV synthesizers will use this information to automatically learn the scales (min/max values and distribution) for each different segment of the real data. There is no need to manually input this data.

For example, the combined value of categorical columns test_type and units will determine the value of numerical column for test_results. So if test type is 'height' and units are 'inches' then it forces the synthesizer to learn the scale specially for this segment.

For more information about using predefined constraints, please see the .

Constraint-Augmented Generation tutorial

❖ SDV Enterprise Bundle. This feature is available as part of the CAG Bundle, an optional add-on to SDV Enterprise. For more information, please visit the page.

CAG Bundle