LogoLogo
GitHubSlackDataCebo
  • SDMetrics
  • Getting Started
    • Installation
    • Quickstart
    • Metadata
      • Single Table Metadata
      • Multi Table Metadata
      • Sequential Metadata
  • Reports
    • Quality Report
      • What's included?
      • Single Table API
      • Multi Table API
    • Diagnostic Report
      • What's included?
      • Single Table API
      • Multi Table API
    • Other Reports
    • Visualization Utilities
  • Metrics
    • Diagnostic Metrics
      • BoundaryAdherence
      • CardinalityBoundaryAdherence
      • CategoryAdherence
      • KeyUniqueness
      • ReferentialIntegrity
      • TableStructure
    • Quality Metrics
      • CardinalityShapeSimilarity
      • CategoryCoverage
      • ContingencySimilarity
      • CorrelationSimilarity
      • KSComplement
      • MissingValueSimilarity
      • RangeCoverage
      • SequenceLengthSimilarity
      • StatisticMSAS
      • StatisticSimilarity
      • TVComplement
    • Privacy Metrics
      • DCRBaselineProtection
      • DCROverfittingProtection
      • DisclosureProtection
      • DisclosureProtectionEstimate
      • CategoricalCAP
    • ML Augmentation Metrics
      • BinaryClassifierPrecisionEfficacy
      • BinaryClassifierRecallEfficacy
    • Metrics in Beta
      • CSTest
      • Data Likelihood
        • BNLikelihood
        • BNLogLikelihood
        • GMLikelihood
      • Detection: Sequential
      • Detection: Single Table
      • InterRowMSAS
      • ML Efficacy: Sequential
      • ML Efficacy: Single Table
        • Binary Classification
        • Multiclass Classification
        • Regression
      • NewRowSynthesis
      • * OutlierCoverage
      • Privacy Against Inference
      • * SmoothnessSimilarity
  • Resources
    • Citation
    • Contributions
      • Defining your metric
      • Development
      • Release FAQs
    • Enterprise
      • Domain Specific Reports
    • Blog
Powered by GitBook
On this page
  • Data Compatibility
  • Score
  • How does it work?
  • Usage
  • References
  1. Metrics
  2. Diagnostic Metrics

CardinalityBoundaryAdherence

PreviousBoundaryAdherenceNextCategoryAdherence

Last updated 1 month ago

If there are two connected tables, the cardinality refers to the number of connections between a parent row and the child. This metric measures whether the cardinality of the synthetic data follows the min/max values as determined by the real data.

Data Compatibility

  • Foreign Key : This metric is meant for foreign keys

  • Primary Key : This metric validates that the foreign key values are found in the primary key

This metric ignores missing values in the foreign key.

Score

  • (best) 1.0: The cardinality of the synthetic data is always in the min/max bounds as determined by the real data.

  • (worst) 0.0: The cardinality of the synthetic data is never whether the min/max bounds.

The example below shows a distribution of cardinality values for real and synthetic data (black and green, respectively). The real data has a min cardinality of 0 and a max of 4. Since the synthetic data is contained within these bounds, the score is 1.0.

How does it work?

In a multi table setup, there is a parent and child table. The parent contains a primary key that uniquely identifies every row while the child contains a foreign key that refers to a parent row. The foreign keys may repeat, as multiple children can reference the same parent.

This metric computes the cardinality [1] of each parent row. That is, it computes the number of children that each parent rows has so that each parent row is associated with an integer ≥ 0. This yields a set of values for both the real data, r, and the synthetic data, s. The score is based on the proportion of rows in s that follow the min/max boundary.

Usage

To manually apply this metric, access the column_pairs module and use the compute method.

from sdmetrics.column_pairs import CardinalityBoundaryAdherence

CardinalityBoundaryAdherence.compute(
    real_data=(real_table['primary_key'], real_table['foreign_key']),
    synthetic_data=(synthetic_table['primary_key'], synthetic_table['foreign_key'])
)

Parameters

  • (required) real_data: A tuple of 2 pandas.Series objects. The first represents the primary key of the real data and the second represents the foreign key.

  • (required) synthetic_data: A tuple of pandas.Series objects. The first represents the primary key of the synthetic data and the second represents the foreign key.

References

score=∣s,s≥min(r) and s≤max(r)∣∣s∣score = \frac{| s, s\ge min(r) \text{ and } s\le max(r)|}{| s|}score=∣s∣∣s,s≥min(r) and s≤max(r)∣​

Recommended Usage: The applies this metric to applicable columns.

[1]

Diagnostic Report
https://en.wikipedia.org/wiki/Cardinality_(data_modeling)
The parent table contains primary keys while the child table has foreign keys that refers to them. Each parent row has a different number of children based on the references. For example, User_00 has 1 child row, User_01 has 2, user_02 has 0 and so on.