TableStructure
This metric measures whether the synthetic data captures the same table structure as the real data. We expect the synthetic data to have the same column names as the real data, and for those columns to have the same data storage type (ints, strings, etc.).
Data Compatibility
Any data: This metric captures the column names in all columns
Score
(best) 1.0: The synthetic data has the same column names as the real data
(worst) 0.0: There is no overlap in columns between the real and synthetic data
How does it work?
This metric identifies all the columns names in the real data (r) and the synthetic data (s). The final score is based on the overlap between the columns of these datasets.
Starting from SDV 0.16.0: In the numerator, we consider a column as overlapping if it has the same name and the same pandas dtype. In the denominator, we will consider all combinations of (column name, dtype) that appear across the real and synthetic data.
Usage
Access this metric from the single_table
module and use the compute
method.
Parameters
(required)
real_data
: A pandas.DataFrame containing real columns(required)
synthetic_data
: A similar pandas.DataFrame containing synthetic columns
Last updated