ReferentialIntegrity
Last updated
Last updated
This metric measures the integrity of a connection between a foreign key and primary key. Every value in the foreign key column must be found in the primary key.
Foreign Key : This metric is meant for foreign keys
Primary Key : This metric validates that the foreign key values are found in the primary key
This metric counts missing values as valid foreign keys.
(best) 1.0: All the foreign key values are found in the primary key
(worst) 0.0: None of the foreign key values are found in the primary key. This indicates that the dataset has orphan children, which is invalid in most database systems.
In a multi table setup, there is a parent and child table. The parent contains a primary key that uniquely identifies every row while the child contains a foreign key that refers to a parent row. The foreign keys may repeat, as multiple children can reference the same parent.
If s represents the synthetic data, then this metric identifies whether the foreign key values (FK) in s match a value in the primary key (PK) of s. The score is the proportion of foreign key values that are found in the primary key column.
Note that if a foreign key value is missing, this metric counts is as a valid, meaning that it will be included in the numerator.
Recommended Usage: The Diagnostic Report applies this metric to applicable columns.
To manually apply this metric, access the column_pairs
module and use the compute
method.
Parameters
(required) real_data
: A tuple of 2 pandas.Series objects. The first represents the primary key of the real data and the second represents the foreign key.
(required) synthetic_data
: A tuple of pandas.Series objects. The first represents the primary key of the synthetic data and the second represents the foreign key.