If there are two connected tables, the cardinality refers to the number of connections between a parent row and the child. This metric measures whether the cardinality of the synthetic data follows the min/max values as determined by the real data.

## Data Compatibility

• Foreign Key : This metric is meant for foreign keys

• Primary Key : This metric validates that the foreign key values are found in the primary key

This metric ignores missing values in the foreign key.

## Score

• (best) 1.0: The cardinality of the synthetic data is always in the min/max bounds as determined by the real data.

• (worst) 0.0: The cardinality of the synthetic data is never whether the min/max bounds.

The example below shows a distribution of cardinality values for real and synthetic data (black and green, respectively). The real data has a min cardinality of 0 and a max of 4. Since the synthetic data is contained within these bounds, the score is 1.0.

## How does it work?

In a multi table setup, there is a parent and child table. The parent contains a primary key that uniquely identifies every row while the child contains a foreign key that refers to a parent row. The foreign keys may repeat, as multiple children can reference the same parent.

This metric computes the cardinality [1] of each parent row. That is, it computes the number of children that each parent rows has so that each parent row is associated with an integer ≥ 0. This yields a set of values for both the real data, r, and the synthetic data, s. The score is based on the proportion of rows in s that follow the min/max boundary.

$score = \frac{| s, s\ge min(r) \text{ and } s\le max(r)|}{| s|}$

## Usage

Recommended Usage: The Diagnostic Report applies this metric to applicable columns.

To manually apply this metric, access the column_pairs module and use the compute method.

from sdmetrics.column_pairs import CardinalityBoundaryAdherence

real_data=(real_table['primary_key'], real_table['foreign_key']),
synthetic_data=(synthetic_table['primary_key'], synthetic_table['foreign_key'])
)

Parameters

• (required) real_data: A tuple of 2 pandas.Series objects. The first represents the primary key of the real data and the second represents the foreign key.

• (required) synthetic_data: A tuple of pandas.Series objects. The first represents the primary key of the synthetic data and the second represents the foreign key.

Last updated