Quality Metrics
Quality metrics capture the statistical similarity between real data and synthetic data. If the synthetic and real data are statistically similar, we refer to the synthetic data as being high quality.
We intend the quality metrics to be aspirational. While it may not always be possible to achieve 100% quality on all metrics, optimizing them can benefit your downstream synthetic data use case.
Measure the quality of your entire dataset. The Quality Report is designed to capture quality measurements across multiple tables and columns. It determines which metrics to apply based on the type of columns, providing a consolidated score.
Browse
Apply these metrics to individual columns and tables in your data:
KSComplement, TVComplement: compare column shapes (aka marginal distributions, histograms)
ContingencySimilarity, CorrelationSimilarity: compare 2D distributions & pairwise correlations
CardinalityShapeSimilarity: compare the frequency of parent/child connections (aka cardinality)
CategoryCoverage, RangeCoverage: measure whether the overall synthetic data spans all the possibilities
SequenceLengthSimilarity, StatisticMSAS: compares the quality of real and synthetic data that represents sequential information
MissingValueSimilarity, StatisticSimilarity: compare individual statistics of the data
Last updated