ReferentialIntegrity
This metric measures the integrity of a connection between a foreign key and primary key. Every value in the foreign key column must be found in the primary key.
Data Compatibility
Foreign Key : This metric is meant for foreign keys
Primary Key : This metric validates that the foreign key values are found in the primary key
This metric counts missing values as valid foreign keys.
Score
(best) 1.0: All the foreign key values are found in the primary key
(worst) 0.0: None of the foreign key values are found in the primary key. This indicates that the dataset has orphan children, which is invalid in most database systems.
How does it work?
In a multi table setup, there is a parent and child table. The parent contains a primary key that uniquely identifies every row while the child contains a foreign key that refers to a parent row. The foreign keys may repeat, as multiple children can reference the same parent.
If s represents the synthetic data, then this metric identifies whether the foreign key values (FK) in s match a value in the primary key (PK) of s. The score is the proportion of foreign key values that are found in the primary key column.
Note that if a foreign key value is missing, this metric counts is as a valid, meaning that it will be included in the numerator.
Usage
Recommended Usage: The Diagnostic Report applies this metric to applicable columns.
To manually apply this metric, access the column_pairs
module and use the compute
method.
Parameters
(required)
real_data
: A tuple of 2 pandas.Series objects. The first represents the primary key of the real data and the second represents the foreign key.(required)
synthetic_data
: A tuple of pandas.Series objects. The first represents the primary key of the synthetic data and the second represents the foreign key.
FAQs
Last updated