TableStructure

This metric measures whether the synthetic data captures the same table structure as the real data. We expect the synthetic data to have the same column names as the real data.

Data Compatibility

  • Any data: This metric captures the column names in all columns

Score

(best) 1.0: The synthetic data has the same column names as the real data

(worst) 0.0: There is no overlap in columns between the real and synthetic data

How does it work?

This metric identifies all the columns names in the real data (r) and the synthetic data (s). The final score is based on the overlap between the columns of these datasets.

score=rsrsscore = \frac{|r \cap s|}{|r \cup s|}

Usage

Access this metric from the single_table module and use the compute method.

from sdmetrics.single_table import TableStructure

TableStructure.compute(
    real_data=real_table,
    synthetic_data=synthetic_table
)

Parameters

  • (required) real_data: A pandas.DataFrame containing real columns

  • (required) synthetic_data: A similar pandas.DataFrame containing synthetic columns

Last updated