CategoryCoverage
This metric measures whether a synthetic column covers all the possible categories that are present in a real column.
Data Compatibility
Categorical: This metric is meant for discrete, categorical data
Boolean: This metric is meant for boolean data
This metric ignores missing values.
Score
(best) 1.0: The synthetic column contains all the unique categories present in the real column
(worst) 0.0: The synthetic column contains none of the categories present in the real column
The plot below shows some fictitious real and synthetic data (black and green respectively) with CategoryCoverage=0.6.

Science, Fine Arts, Arts, Business Administration and Other. However, the synthetic data only includes 3 of those categories, therefore the category coverage is 3/5.How does it work?
This metric first computes the number of unique categories, c, that are present in the real column r. Then it computes the number of those categories present in the synthetic column, s. It returns the proportion of real categories that are in the synthetic data.
Usage
To manually apply this metric, access the single_column module and use the compute method.
Parameters
(required)
real_data: A pandas.Series object with the column of real data(required)
synthetic_data: A pandas.Series object with the column of synthetic data
FAQs
References
[1] https://en.wikipedia.org/wiki/Generative_adversarial_network
Last updated
