LogoLogo
GitHubSlackDataCebo
  • SDMetrics
  • Getting Started
    • Installation
    • Quickstart
    • Metadata
      • Single Table Metadata
      • Multi Table Metadata
      • Sequential Metadata
  • Reports
    • Quality Report
      • What's included?
      • Single Table API
      • Multi Table API
    • Diagnostic Report
      • What's included?
      • Single Table API
      • Multi Table API
    • Other Reports
    • Visualization Utilities
  • Metrics
    • Diagnostic Metrics
      • BoundaryAdherence
      • CardinalityBoundaryAdherence
      • CategoryAdherence
      • KeyUniqueness
      • ReferentialIntegrity
      • TableStructure
    • Quality Metrics
      • CardinalityShapeSimilarity
      • CategoryCoverage
      • ContingencySimilarity
      • CorrelationSimilarity
      • KSComplement
      • MissingValueSimilarity
      • RangeCoverage
      • SequenceLengthSimilarity
      • StatisticMSAS
      • StatisticSimilarity
      • TVComplement
    • Privacy Metrics
      • DCRBaselineProtection
      • DCROverfittingProtection
      • DisclosureProtection
      • DisclosureProtectionEstimate
      • CategoricalCAP
    • ML Augmentation Metrics
      • BinaryClassifierPrecisionEfficacy
      • BinaryClassifierRecallEfficacy
    • Metrics in Beta
      • CSTest
      • Data Likelihood
        • BNLikelihood
        • BNLogLikelihood
        • GMLikelihood
      • Detection: Sequential
      • Detection: Single Table
      • InterRowMSAS
      • ML Efficacy: Sequential
      • ML Efficacy: Single Table
        • Binary Classification
        • Multiclass Classification
        • Regression
      • NewRowSynthesis
      • * OutlierCoverage
      • Privacy Against Inference
      • * SmoothnessSimilarity
  • Resources
    • Citation
    • Contributions
      • Defining your metric
      • Development
      • Release FAQs
    • Enterprise
      • Domain Specific Reports
    • Blog
Powered by GitBook
On this page
  • Data Compatibility
  • Score
  • How does it work?
  • Usage
  1. Metrics
  2. Diagnostic Metrics

TableStructure

PreviousReferentialIntegrityNextQuality Metrics

Last updated 5 months ago

This metric measures whether the synthetic data captures the same table structure as the real data. We expect the synthetic data to have the same column names as the real data, and for those columns to have the same data storage type (ints, strings, etc.).

Data Compatibility

  • Any data: This metric captures the column names in all columns

Score

(best) 1.0: The synthetic data has the same column names as the real data

(worst) 0.0: There is no overlap in columns between the real and synthetic data

How does it work?

This metric identifies all the columns names in the real data (r) and the synthetic data (s). The final score is based on the overlap between the columns of these datasets.

score=∣r∩s∣∣r∪s∣score = \frac{|r \cap s|}{|r \cup s|}score=∣r∪s∣∣r∩s∣​

Starting from SDV 0.16.0: In the numerator, we consider a column as overlapping if it has the same name and the same pandas dtype. In the denominator, we will consider all combinations of (column name, dtype) that appear across the real and synthetic data.

Usage

Access this metric from the single_table module and use the compute method.

from sdmetrics.single_table import TableStructure

TableStructure.compute(
    real_data=real_table,
    synthetic_data=synthetic_table
)

Parameters

  • (required) real_data: A pandas.DataFrame containing real columns

  • (required) synthetic_data: A similar pandas.DataFrame containing synthetic columns