LogoLogo
GitHubSlackDataCebo
  • SDMetrics
  • Getting Started
    • Installation
    • Quickstart
    • Metadata
      • Single Table Metadata
      • Multi Table Metadata
      • Sequential Metadata
  • Reports
    • Quality Report
      • What's included?
      • Single Table API
      • Multi Table API
    • Diagnostic Report
      • What's included?
      • Single Table API
      • Multi Table API
    • Other Reports
    • Visualization Utilities
  • Metrics
    • Diagnostic Metrics
      • BoundaryAdherence
      • CardinalityBoundaryAdherence
      • CategoryAdherence
      • KeyUniqueness
      • ReferentialIntegrity
      • TableStructure
    • Quality Metrics
      • CardinalityShapeSimilarity
      • CategoryCoverage
      • ContingencySimilarity
      • CorrelationSimilarity
      • KSComplement
      • MissingValueSimilarity
      • RangeCoverage
      • SequenceLengthSimilarity
      • StatisticMSAS
      • StatisticSimilarity
      • TVComplement
    • Privacy Metrics
      • DCRBaselineProtection
      • DCROverfittingProtection
      • DisclosureProtection
      • DisclosureProtectionEstimate
      • CategoricalCAP
    • ML Augmentation Metrics
      • BinaryClassifierPrecisionEfficacy
      • BinaryClassifierRecallEfficacy
    • Metrics in Beta
      • CSTest
      • Data Likelihood
        • BNLikelihood
        • BNLogLikelihood
        • GMLikelihood
      • Detection: Sequential
      • Detection: Single Table
      • InterRowMSAS
      • ML Efficacy: Sequential
      • ML Efficacy: Single Table
        • Binary Classification
        • Multiclass Classification
        • Regression
      • NewRowSynthesis
      • * OutlierCoverage
      • Privacy Against Inference
      • * SmoothnessSimilarity
  • Resources
    • Citation
    • Contributions
      • Defining your metric
      • Development
      • Release FAQs
    • Enterprise
      • Domain Specific Reports
    • Blog
Powered by GitBook
On this page
  • Data Validity
  • Methodology
  • Data Structure
  • Methodology
  • Relationship Validity
  • Methodology
  1. Reports
  2. Diagnostic Report

What's included?

PreviousDiagnostic ReportNextSingle Table API

Last updated 1 year ago

The diagnostic report captures the Validity, Structure and Relationship Validity. This guide contains some technical details about each property.

The diagnostic score should be close to 100%. The diagnostic report checks for basic data validity and data structure issues. If you want to create synthetic data that looks and feels similar to the real data, you should expect the score to be close to perfect. If you are using any of the default SDV synthesizers, the score should always be 1.0.

Data Validity

Does each column in the data contain valid data?

Methodology

This property applies metrics based on the column types.

Column Type
Metric
Validity Check

primary keys

Primary keys must always be unique and non-null

numerical, datetime

Continuous values in the synthetic data must adhere to the min/max range in the real data

categorical, boolean

Discrete values in the synthetic data must adhere to the same categories as the real data.

This yields a separate score for every column. The final Data Validity score is the average of all columns.

Data Structure

Does each table have the same overall structure as the real data? The structure includes the column names.

Methodology

Relationship Validity

This property is only available for multi table datasets.

Does the synthetic data contain valid relationships between different tables?

Methodology

Every relationship in your dataset is determined by a primary/foreign key connection. This property applies two metrics to the relationship to determine the validity:

The final Relationship Validity score is the average of all the sub scores.

This property applies the metric to each table of the dataset. This checks to see that there are the same set of column names in the synthetic vs. the real data.

: Does each foreign key refer to an existing primary key? If a foreign key refers to a non-existent primary key, it is known as an orphaned child, which is invalid in most databases.

: Does each primary key have the correct number of children? The correct number is based on the min/max bounds that are present in the real data.

TableStructure
ReferentialIntegrity
CardinalityBoundaryAdherence
KeyUniqueness
BoundaryAdherence
CategoryAdherence
💯