LogoLogo
GitHubSlackDataCebo
  • SDMetrics
  • Getting Started
    • Installation
    • Quickstart
    • Metadata
      • Single Table Metadata
      • Multi Table Metadata
      • Sequential Metadata
  • Reports
    • Quality Report
      • What's included?
      • Single Table API
      • Multi Table API
    • Diagnostic Report
      • What's included?
      • Single Table API
      • Multi Table API
    • Other Reports
    • Visualization Utilities
  • Metrics
    • Diagnostic Metrics
      • BoundaryAdherence
      • CardinalityBoundaryAdherence
      • CategoryAdherence
      • KeyUniqueness
      • ReferentialIntegrity
      • TableStructure
    • Quality Metrics
      • CardinalityShapeSimilarity
      • CategoryCoverage
      • ContingencySimilarity
      • CorrelationSimilarity
      • KSComplement
      • MissingValueSimilarity
      • RangeCoverage
      • SequenceLengthSimilarity
      • StatisticMSAS
      • StatisticSimilarity
      • TVComplement
    • Privacy Metrics
      • DCRBaselineProtection
      • DCROverfittingProtection
      • DisclosureProtection
      • DisclosureProtectionEstimate
      • CategoricalCAP
    • ML Augmentation Metrics
      • BinaryClassifierPrecisionEfficacy
      • BinaryClassifierRecallEfficacy
    • Metrics in Beta
      • CSTest
      • Data Likelihood
        • BNLikelihood
        • BNLogLikelihood
        • GMLikelihood
      • Detection: Sequential
      • Detection: Single Table
      • InterRowMSAS
      • ML Efficacy: Sequential
      • ML Efficacy: Single Table
        • Binary Classification
        • Multiclass Classification
        • Regression
      • NewRowSynthesis
      • * OutlierCoverage
      • Privacy Against Inference
      • * SmoothnessSimilarity
  • Resources
    • Citation
    • Contributions
      • Defining your metric
      • Development
      • Release FAQs
    • Enterprise
      • Domain Specific Reports
    • Blog
Powered by GitBook
On this page
  • Data Compatibility
  • Score
  • How does it work?
  • Usage
  • FAQs
  1. Metrics
  2. Diagnostic Metrics

BoundaryAdherence

PreviousDiagnostic MetricsNextCardinalityBoundaryAdherence

Last updated 1 month ago

This metrics measures whether a synthetic column respects the minimum and maximum values of the real column. It returns the percentage of synthetic rows that adhere to the real boundaries.

Data Compatibility

  • Numerical : This metric is meant for numerical data

  • Datetime : This metric converts datetime values into numerical values

If you have missing values in the real data, then the metric will consider them valid in the synthetic data. Otherwise, they will be marked as out-of-bounds.

Score

(best) 1.0: All values in the synthetic data respect the min/max boundaries of the real data

(worst) 0.0: No value in the synthetic data is in between the min and max value of the real data

The graph below shows an example of some fictional real and synthetic data (black and green, respectively) with BoundaryAdherence=0.912.

How does it work?

This metric computes the min and max values of the real column. Then, it computes the frequency of synthetic values that are in the [min, max] range.

Usage

To manually apply this metric, access the single_column module and use the compute method.

from sdmetrics.single_column import BoundaryAdherence

BoundaryAdherence.compute(
    real_data=real_table['column_name'],
    synthetic_data=synthetic_table['column_name']
)

Parameters

  • (required) real_data: A pandas.Series object with the column of real data

  • (required) synthetic_data: A pandas.Series object with the column of synthetic data

FAQs

Is there an equivalent metric for discrete data?
What kind of scores should I expect to see?

The scores that you see are based on the model that you used to create the synthetic data. Some models are designed to learn the min/max boundaries from the real data. For such models, the score should always be 1.0

Other models do not learn the boundaries. In these cases, you can expect the score to be close to, but not exactly, 1.0. Note that this may be a desirable feature if it is risky to leak the true min/max values in the synthetic data.

What if my synthetic data is only covering a small subset of the real values?

Recommended Usage: The applies this metric to applicable columns.

For discrete datasets, only a select number of values are possible. Use the metric to ensure they have the correct values.

This metric only quantifies cases where the synthetic data is going out of bounds. If you're interested in knowing whether the synthetic data covers the full range of real values, use the metric.

Diagnostic Report
CategoryAdherence
RangeCoverage
The real data is in range [37.0, 97.7], shown by vertical dotted lines. Around 8.8% of the synthetic data is outside of the min/max bounds of the real data, so the overall BoundaryAdherence score is 91.2%.