TableStructure
Last updated
Last updated
This metric measures whether the synthetic data captures the same table structure as the real data. We expect the synthetic data to have the same column names as the real data, and for those columns to have the same data storage type (ints, strings, etc.).
Any data: This metric captures the column names in all columns
(best) 1.0: The synthetic data has the same column names as the real data
(worst) 0.0: There is no overlap in columns between the real and synthetic data
This metric identifies all the columns names in the real data (r) and the synthetic data (s). The final score is based on the overlap between the columns of these datasets.
Starting from SDV 0.16.0: In the numerator, we consider a column as overlapping if it has the same name and the same pandas dtype. In the denominator, we will consider all combinations of (column name, dtype) that appear across the real and synthetic data.
Access this metric from the single_table
module and use the compute
method.
Parameters
(required) real_data
: A pandas.DataFrame containing real columns
(required) synthetic_data
: A similar pandas.DataFrame containing synthetic columns