GMLikelihood
Data Likelihood describes a set of metrics that calculate the likelihood of the synthetic data belonging to the real data. This metric uses Gaussian Mixture Models to make this calculation.
Data Compatibility
Numerical : This metric is meant for continuous, numerical data
This metric ignores any incompatible column types.
This metric does not accept missing values
Score
(highest) ∞: According to the algorithm, the synthetic data has the highest possible likelihood of belonging to the real data
(lowest) -∞: According to the algorithm used, the synthetic data has the lowest possible likelihood of belonging to the real data
There are multiple interpretations of the score. A high score can indicates high synthetic data quality as well as low privacy. A low score can indicate low synthetic data quality as well as high privacy.
How does it work?
This metric fits multiple Gaussian mixture models [1] to learn the distribution of the real data. The model learns to produce a likelihood estimate for every row ranging from -∞ to to +∞, where -∞ means the row is likely not part of the data and +∞ means that it is.
We apply the model to all the synthetic data and return the average likelihood score.
Usage
Access this metric from the single_table
module and use the compute
method.
Parameters
(required)
real_data
: A pandas.DataFrame containing the real data(required)
synthetic_data
: A pandas.DataFrame containing the same columns of synthetic datan_components
: Number of components to use for the mixture model
(default) (1, 30)
Search for the optimal number of components between 1 and 30
(<low integer>, <high integer>)
Search for the optimal number of components between the low and high integer
<integer>
Use exactly the integer number of components provided
iterations
: Number of times that each number of components should be evaluated before averaging the scores. Defaults to 3.retries
: Number of times that each iteration will be retried if the mixture model crashes during fit. Defaults to 3.
FAQs
References
Last updated