LogoLogo
GitHubSlackDataCebo
  • SDMetrics
  • Getting Started
    • Installation
    • Quickstart
    • Metadata
      • Single Table Metadata
      • Multi Table Metadata
      • Sequential Metadata
  • Reports
    • Quality Report
      • What's included?
      • Single Table API
      • Multi Table API
    • Diagnostic Report
      • What's included?
      • Single Table API
      • Multi Table API
    • Other Reports
    • Visualization Utilities
  • Metrics
    • Diagnostic Metrics
      • BoundaryAdherence
      • CardinalityBoundaryAdherence
      • CategoryAdherence
      • KeyUniqueness
      • ReferentialIntegrity
      • TableStructure
    • Quality Metrics
      • CardinalityShapeSimilarity
      • CategoryCoverage
      • ContingencySimilarity
      • CorrelationSimilarity
      • KSComplement
      • MissingValueSimilarity
      • RangeCoverage
      • SequenceLengthSimilarity
      • StatisticMSAS
      • StatisticSimilarity
      • TVComplement
    • Privacy Metrics
      • DCRBaselineProtection
      • DCROverfittingProtection
      • DisclosureProtection
      • DisclosureProtectionEstimate
      • CategoricalCAP
    • ML Augmentation Metrics
      • BinaryClassifierPrecisionEfficacy
      • BinaryClassifierRecallEfficacy
    • Metrics in Beta
      • CSTest
      • Data Likelihood
        • BNLikelihood
        • BNLogLikelihood
        • GMLikelihood
      • Detection: Sequential
      • Detection: Single Table
      • InterRowMSAS
      • ML Efficacy: Sequential
      • ML Efficacy: Single Table
        • Binary Classification
        • Multiclass Classification
        • Regression
      • NewRowSynthesis
      • * OutlierCoverage
      • Privacy Against Inference
      • * SmoothnessSimilarity
  • Resources
    • Citation
    • Contributions
      • Defining your metric
      • Development
      • Release FAQs
    • Enterprise
      • Domain Specific Reports
    • Blog
Powered by GitBook
On this page
  • Column Shapes
  • Methodology
  • Column Pair Trends
  • Methodology
  • Cardinality
  • Methodology
  • Intertable Trends
  • Methodology
  • FAQs
  1. Reports
  2. Quality Report

What's included?

PreviousQuality ReportNextSingle Table API

Last updated 2 months ago

The quality report captures the Column Shapes, Column Pair Trends and Cardinality. This guide contains some technical details about each property.

Column Shapes

Does the synthetic data capture the shape of each column?

The shape of a column describes its overall distribution. The higher the score, the more similar the distributions of real and synthetic data.

Methodology

This property applies metrics based on the column types.

Column Type
Metric

numerical

datetime

boolean

categorical

This yields a separate score for every column. The final Column Shapes score is the average of all columns.

You may notice that column shape quality is better for discrete columns (categorical, boolean) as opposed to continuous columns (numerical, datetime). Generally, we've found that it's much easier to create synthetic data for a small number of known categories than large ranges of numerical values.

Column Pair Trends

Does the synthetic data capture trends between pairs of columns?

The trend between two columns describes how they vary in relation to each other, for example the correlation. The higher the score, the more the trends are alike.

Methodology

This property applies a different metric metric based on the type of data

Column Types
Metric

numerical (or datetime) with another numerical (or datetime)

categorical (or boolean) with another categorical (or boolean)

numerical (or datetime) with a categorical (or boolean)

This yields a score between every pair of columns. The Column Pair Trends score is the average of all the scores.

The CorrelationSimilarity metric works by computing a separate value for the real vs. the synthetic data. The Quality Report shows a side-by-side visualization for real vs. synthetic data when applicable.

Cardinality

This property is only available for multi table datasets. (In older versions of SDMetrics, it was known as "Table Relationships".)

Does the synthetic data capture the number of connections between parent and child tables? This is also known as the cardinality of the tables.

Methodology

Intertable Trends

This property is only available for multi table datasets.

Does the synthetic data capture trends between columns across different tables?

This is similar to the Column Pair Trends property, but it is applied across parent/child tables. For example, a column in a parent table might be correlated with a column in the child.

Methodology

This property denormalizes the parent and child table into a single, flat table. Then, it applies the same metrics as the Column Pair Trends property.

Column Types
Metric

numerical (or datetime) with another numerical (or datetime)

categorical (or boolean) with another categorical (or boolean)

numerical (or datetime) with a categorical (or boolean)

This yields a score between every pair of columns. The Intertable Trends score is the average of all the scores.

FAQs

Can this report check for similarity in higher orders?

Discretize the numerical columns into bins, then apply

This property applies the metric for every set of connected tables: parent table and child table.

Discretize the numerical columns into bins, then apply

Higher order distributions of 3 or more columns are not included in the Quality Report. We have found that very high order similarity may have an adverse effect on the synthetic data usability; after a certain point, it indicates that the synthetic data is just a copy of the real data. (For more information, see the metric.)

If higher order similarity is a requirement, you likely have a targeted use case for synthetic data (eg. machine learning efficacy). Until we add these reports, you may want to explore other .

CardinalityShapeSimilarity
NewRowSynthesis
metrics
KSComplement
KSComplement
TVComplement
TVComplement
CorrelationSimilarity
ContingencySimilarity
ContingencySimilarity
CorrelationSimilarity
ContingencySimilarity
ContingencySimilarity