TableStructure

This metric measures whether the synthetic data captures the same table structure as the real data. We expect the synthetic data to have the same column names as the real data, and for those columns to have the same data storage type (ints, strings, etc.).

Data Compatibility

Any data: This metric captures the column names in all columns

Score

(best) 1.0: The synthetic data has the same column names as the real data

(worst) 0.0: There is no overlap in columns between the real and synthetic data

How does it work?

This metric identifies all the columns names in the real data (r) and the synthetic data (s). The final score is based on the overlap between the columns of these datasets.

score = \frac{|r \cap s|}{|r \cup s|}

Starting from SDV 0.16.0: In the numerator, we consider a column as overlapping if it has the same name and the same pandas dtype. In the denominator, we will consider all combinations of (column name, dtype) that appear across the real and synthetic data.

Usage

Access this metric from the single_table module and use the compute method.

from sdmetrics.single_table import TableStructure

TableStructure.compute(
    real_data=real_table,
    synthetic_data=synthetic_table
)

Parameters

(required) real_data: A pandas.DataFrame containing real columns
(required) synthetic_data: A similar pandas.DataFrame containing synthetic columns

PreviousReferentialIntegrity NextQuality Metrics

Last updated 9 months ago