Synthetic Data Vault
GitHubSlackDataCebo
  • Welcome to the SDV!
  • Tutorials
  • Explore SDV
    • SDV Community
    • SDV Enterprise
      • ⭐Compare Features
    • SDV Bundles
      • ❖ AI Connectors
      • ❖ CAG
      • ❖ Differential Privacy
      • ❖ XSynthesizers
  • Single Table Data
    • Data Preparation
      • Loading Data
      • Creating Metadata
    • Modeling
      • Synthesizers
        • GaussianCopulaSynthesizer
        • CTGANSynthesizer
        • TVAESynthesizer
        • ❖ XGCSynthesizer
        • ❖ BootstrapSynthesizer
        • ❖ SegmentSynthesizer
        • * DayZSynthesizer
        • ❖ DPGCSynthesizer
        • ❖ DPGCFlexSynthesizer
        • CopulaGANSynthesizer
      • Customizations
        • Constraints
        • Preprocessing
    • Sampling
      • Sample Realistic Data
      • Conditional Sampling
    • Evaluation
      • Diagnostic
      • Data Quality
      • Visualization
      • Privacy
        • Empirical Differential Privacy
        • SDMetrics: Privacy Metrics
  • Multi Table Data
    • Data Preparation
      • Loading Data
        • Demo Data
        • CSV
        • Excel
        • ❖ AlloyDB
        • ❖ BigQuery
        • ❖ MSSQL
        • ❖ Oracle
        • ❖ Spanner
      • Cleaning Your Data
      • Creating Metadata
    • Modeling
      • Synthesizers
        • * DayZSynthesizer
        • * IndependentSynthesizer
        • HMASynthesizer
        • * HSASynthesizer
      • Customizations
        • Constraints
        • Preprocessing
      • * Performance Estimates
    • Sampling
    • Evaluation
      • Diagnostic
      • Data Quality
      • Visualization
  • Sequential Data
    • Data Preparation
      • Loading Data
      • Cleaning Your Data
      • Creating Metadata
    • Modeling
      • PARSynthesizer
      • Customizations
    • Sampling
      • Sample Realistic Data
      • Conditional Sampling
    • Evaluation
  • Concepts
    • Metadata
      • Sdtypes
      • Metadata API
      • Metadata JSON
    • Constraint-Augmented Generation (CAG)
      • Predefined Constraints
        • FixedCombinations
        • FixedIncrements
        • Inequality
        • OneHotEncoding
        • Range
        • ❖ CarryOverColumns
        • * ChainedInequality
        • ❖ CompositeKey
        • ❖ FixedNullCombinations
        • ❖ ForeignToForeignKey
        • ❖ ForeignToPrimaryKeySubset
        • ❖ MixedScales
        • ❖ PrimaryToPrimaryKey
        • ❖ PrimaryToPrimaryKeySubset
        • ❖ ReferenceTable
        • ❖ SelfReferentialHierarchy
        • ❖ UniqueBridgeTable
      • Program Your Own Constraint
      • Constraints API
  • Support
    • Troubleshooting
      • Help with Installation
      • Help with SDV
    • Versioning & Backwards Compatibility Policy
Powered by GitBook

Copyright (c) 2023, DataCebo, Inc.

On this page
  • Constraint API
  • Usage
  • FAQs
  1. Concepts
  2. Constraint-Augmented Generation (CAG)
  3. Predefined Constraints

Inequality

The Inequality constraint enforces an inequality relationship between a pair of columns. For every row, the value in one column must be greater than a value in another.

Constraint API

Parameters

  • (required) low_column_name: The name of the column whose values must be lower. Only numerical and datetime columns are allowed.

  • (required) high_column_name: The name of the column whose values must be greater. Only numerical and datetime columns are allowed.

  • strict_boundaries: Whether the high column must be strictly greater than the low column

    • (default) True: The value in the high column must be strictly greater than the value in the low column

    • False: The value in the high column must be greater than or equal to the value in the low column.

  • table_name: A string with the name of the table to apply this to. Required if you have a multi-table dataset.

from sdv.cag import Inequality

my_constraint = Inequality(
    low_column_name='checkin_date',
    high_column_name='checkout_date',
    strict_boundaries=True
)

Usage

Apply the constraint to any SDV synthesizer. Then fit and sample as usual.

synthesizer = GaussianCopulaSynthesizer(metadata)
synthesizer.add_constraints([my_constraint])

synthesizer.fit(data)
synthetic_data = synthesizer.sample()

FAQs

What happens to missing values?

This constraint ignores missing values. The constraint considered is valid as long as the numerical values (non-missing values) follow the inequality.

What if I want to compare a column to a single, fixed value?
PreviousFixedIncrementsNextOneHotEncoding

Last updated 23 hours ago

Many of our SDV synthesizers are already designed to learned the min/max values in every column and replicate the ranges in the synthetic data. This parameter is often called enforce_min_max_values and it applies to all numerical/datetime columns. For more information, check your .

You can also control the enforcement on a per-column basis. Turn on/off the enforcement on individual columns by accessing and updating the transformers. For more information, see the .

Both of these options will allow you to fix the range (as observed in the real data) or expand it (by not enforcing it). If you'd like to further restrict the range, we encourage you to model the data as-is and use to get the range you need.

synthesizer's API guide
Preprocessing guide
conditional sampling

For more information about using predefined constraints, please see the .

Constraint-Augmented Generation tutorial