# CardinalityBoundaryAdherence

Last updated

Last updated

If there are two connected tables, the *cardinality* refers to the number of connections between a parent row and the child. This metric measures whether the cardinality of the synthetic data follows the min/max values as determined by the real data.

Data Compatibility

**Foreign Key**: This metric is meant for foreign keys**Primary Key**: This metric validates that the foreign key values are found in the primary key

This metric ignores missing values in the foreign key.

Score

**(best) 1.0**: The cardinality of the synthetic data is always in the min/max bounds as determined by the real data.**(worst) 0.0**: The cardinality of the synthetic data is never whether the min/max bounds.

The example below shows a distribution of cardinality values for real and synthetic data (black and green, respectively). The real data has a min cardinality of 0 and a max of 4. Since the synthetic data is contained within these bounds, the score is 1.0.

How does it work?

In a multi table setup, there is a parent and child table. The parent contains a primary key that uniquely identifies every row while the child contains a foreign key that refers to a parent row. The foreign keys may repeat, as multiple children can reference the same parent.

This metric computes the cardinality [1] of each parent row. That is, it computes the number of children that each parent rows has so that each parent row is associated with an integer ≥ 0. This yields a set of values for both the real data, *r*, and the synthetic data, *s*. The score is based on the proportion of rows in *s* that follow the min/max boundary.

$score = \frac{| s, s\ge min(r) \text{ and } s\le max(r)|}{| s|}$

**Recommended Usage:** The Diagnostic Report applies this metric to applicable columns.

To manually apply this metric, access the `column_pairs`

module and use the `compute`

method.

**Parameters**

(required)

`real_data`

: A tuple of 2 pandas.Series objects. The first represents the primary key of the real data and the second represents the foreign key.(required)

`synthetic_data`

: A tuple of pandas.Series objects. The first represents the primary key of the synthetic data and the second represents the foreign key.

References

[1] https://en.wikipedia.org/wiki/Cardinality_(data_modeling)