Quality Metrics
Last updated
Last updated
Quality metrics capture the statistical similarity between real data and synthetic data. If the synthetic and real data are statistically similar, we refer to the synthetic data as being high quality.
We intend the quality metrics to be aspirational. While it may not always be possible to achieve 100% quality on all metrics, optimizing them can benefit your downstream synthetic data use case.
Measure the quality of your entire dataset. The is designed to capture quality measurements across multiple tables and columns. It determines which metrics to apply based on the type of columns, providing a consolidated score.
Apply these metrics to individual columns and tables in your data:
, : compare column shapes (aka marginal distributions, histograms)
, : compare 2D distributions & pairwise correlations
: compare the frequency of parent/child connections (aka cardinality)
, : measure whether the overall synthetic data spans all the possibilities
, : compares the quality of real and synthetic data that represents sequential information
, : compare individual statistics of the data