CategoryCoverage
Last updated
Last updated
This metric measures whether a synthetic column covers all the possible categories that are present in a real column.
Categorical: This metric is meant for discrete, categorical data
Boolean: This metric is meant for boolean data
This metric ignores missing values.
(best) 1.0: The synthetic column contains all the unique categories present in the real column
(worst) 0.0: The synthetic column contains none of the categories present in the real column
The plot below shows some fictitious real and synthetic data (black and green respectively) with CategoryCoverage=0.6.
This metric first computes the number of unique categories, c, that are present in the real column r. Then it computes the number of those categories present in the synthetic column, s. It returns the proportion of real categories that are in the synthetic data.
To manually apply this metric, access the single_column
module and use the compute
method.
Parameters
(required) real_data
: A pandas.Series object with the column of real data
(required) synthetic_data
: A pandas.Series object with the column of synthetic data
[1] https://en.wikipedia.org/wiki/Generative_adversarial_network
Science
, Fine Arts
, Arts
, Business Administration
and Other
. However, the synthetic data only includes 3 of those categories, therefore the category coverage is 3/5.