KeyUniqueness
Last updated
Last updated
This metric measures whether the keys in a particular dataset are unique. We expect that certain types of keys, such as primary keys, are always unique in order to be valid.
ID : This metric is meant for ID data
Other : This metric can work with any other type of semantic data that is used in place of an ID, such as a natural key like email
(best) 1.0: All of the key values in the synthetic data are unique
(worst) 0.0: None of the key values in the synthetic data are unique
This metric measures how many values in the synthetic data, s, are duplicates, meaning that there is another value that is exactly the same. Call this set Ds. The score is the proportion of values that are not duplicates.
Recommended Usage: The Diagnostic Report applies this metric to applicable keys (primary and alternate keys).
To manually run this metric, access the single_column
module and use the compute
method.
Parameters
(required) real_data
: A pandas.Series object with the column of real data
(required) synthetic_data
: A pandas.Series object with the column of synthetic data