Synthetic Data Vault
GitHubSlackDataCebo
  • Welcome to the SDV!
  • Tutorials
  • Explore SDV
    • SDV Community
    • SDV Enterprise
      • ⭐Compare Features
    • SDV Bundles
      • ❖ AI Connectors
      • ❖ CAG
      • ❖ Differential Privacy
      • ❖ XSynthesizers
  • Single Table Data
    • Data Preparation
      • Loading Data
      • Creating Metadata
    • Modeling
      • Synthesizers
        • GaussianCopulaSynthesizer
        • CTGANSynthesizer
        • TVAESynthesizer
        • ❖ XGCSynthesizer
        • ❖ BootstrapSynthesizer
        • ❖ SegmentSynthesizer
        • * DayZSynthesizer
        • ❖ DPGCSynthesizer
        • ❖ DPGCFlexSynthesizer
        • CopulaGANSynthesizer
      • Customizations
        • Constraints
        • Preprocessing
    • Sampling
      • Sample Realistic Data
      • Conditional Sampling
    • Evaluation
      • Diagnostic
      • Data Quality
      • Visualization
      • Privacy
        • Empirical Differential Privacy
        • SDMetrics: Privacy Metrics
  • Multi Table Data
    • Data Preparation
      • Loading Data
        • Demo Data
        • CSV
        • Excel
        • ❖ AlloyDB
        • ❖ BigQuery
        • ❖ MSSQL
        • ❖ Oracle
        • ❖ Spanner
      • Cleaning Your Data
      • Creating Metadata
    • Modeling
      • Synthesizers
        • * DayZSynthesizer
        • * IndependentSynthesizer
        • HMASynthesizer
        • * HSASynthesizer
      • Customizations
        • Constraints
        • Preprocessing
      • * Performance Estimates
    • Sampling
    • Evaluation
      • Diagnostic
      • Data Quality
      • Visualization
  • Sequential Data
    • Data Preparation
      • Loading Data
      • Cleaning Your Data
      • Creating Metadata
    • Modeling
      • PARSynthesizer
      • Customizations
    • Sampling
      • Sample Realistic Data
      • Conditional Sampling
    • Evaluation
  • Concepts
    • Metadata
      • Sdtypes
      • Metadata API
      • Metadata JSON
    • Constraint-Augmented Generation (CAG)
      • Predefined Constraints
        • FixedCombinations
        • FixedIncrements
        • Inequality
        • OneHotEncoding
        • Range
        • ❖ CarryOverColumns
        • * ChainedInequality
        • ❖ CompositeKey
        • ❖ FixedNullCombinations
        • ❖ ForeignToForeignKey
        • ❖ ForeignToPrimaryKeySubset
        • ❖ MixedScales
        • ❖ PrimaryToPrimaryKey
        • ❖ PrimaryToPrimaryKeySubset
        • ❖ ReferenceTable
        • ❖ SelfReferentialHierarchy
        • ❖ UniqueBridgeTable
      • Program Your Own Constraint
      • Constraints API
  • Support
    • Troubleshooting
      • Help with Installation
      • Help with SDV
    • Versioning & Backwards Compatibility Policy
Powered by GitBook

Copyright (c) 2023, DataCebo, Inc.

On this page
  • Constraint API
  • Usage
  • FAQs
  1. Concepts
  2. Constraint-Augmented Generation (CAG)
  3. Predefined Constraints

Range

The Range constraint enforces that for all rows, the value of one of the columns is bounded by the values in the other two columns.

Constraint API

Create a Range constraint.

  • (required) low_column_name: The name of the column that contains the lowest value. This must be a numerical or datetime column.

  • (required) middle_column_name: The name of the column that must be between the low and the high columns. This must be a numerical or datetime column.

  • (required) high_column_name: The name of the column that contains the highest value. This must be a numerical or datetime column.

  • strict_boundaries: Whether the boundaries between each of the comparisons are strict

    • (default) True: The middle column must be strictly greater than the low column and strictly less than the high column.

    • False: The middle column must be greater than or equal to the low column and less than or equal to the high column

  • table_name: A string with the name of the table to apply this to. Required if you have a multi-table dataset.

from sdv.cag import Range

my_constraint = Range(
    low_column_name='child_age',
    middle_column_name='parent_age',
    high_column_name='grandparent_age',
    strict_bounadires=True
)

Usage

Apply the constraint to any SDV synthesizer. Then fit and sample as usual.

synthesizer = GaussianCopulaSynthesizer(metadata)
synthesizer.add_constraints([my_constraint])

synthesizer.fit(data)
synthetic_data = synthesizer.sample()

FAQs

What happens to missing values?

This constraint ignores missing values. The constraint considered is valid as long as the numerical values (non-missing values) follow the logic.

What if I want to compare a column to fixed values?
PreviousOneHotEncodingNext❖ CarryOverColumns

Last updated 22 hours ago

Many of our SDV synthesizers are already designed to learned the min/max values in every column and replicate the ranges in the synthetic data. This parameter is often called enforce_min_max_values and it applies to all numerical/datetime columns. For more information, check your .

You can also control the enforcement on a per-column basis. Turn on/off the enforcement on individual columns by accessing and updating the transformers. For more information, see the .

Both of these options will allow you to fix the range (as observed in the real data) or expand it (by not enforcing it). If you'd like to further restrict the range, we encourage you to model the data as-is and use to get the range you need.

synthesizer's API guide
Preprocessing guide
conditional sampling

For more information about using predefined constraints, please see the .

Constraint-Augmented Generation tutorial