* SmoothnessSimilarity
Last updated
Last updated
*SDV Enterprise Feature. This feature is available to our licensed users and is not currently in our public library. To learn more about the SDV Enterprise and its extra features, visit our website.
This metric measures whether the synthetic data is as smooth as the real data. It evaluates a failure mode where the synthetic data is too smooth compared to the real data.
Numerical : This metric is meant for numerical data
Datetime : This metric converts datetime values into numerical values
This metric ignores missing values.
(best) 1.0: The smoothness between the real data and synthetic data are exactly the same
(worst) 0.0: The smoothness between the real and synthetic data are as different as can be
As a proxy for smoothness, this metric computes the full width at half maximum (FWHM) [1] for both the real and synthetic data. The larger the FWHM value, the more smooth we can consider the data, as shown below
This metric normalizes the differences in FWHM and inverts the score so that 1 means the values are similar (best possible score). In mathematical terms:
where R is the real distribution and S is the synthetic distribution.
To apply this metric, access the single_column
module and use the compute
method.
Parameters
(required) real_data
: A pandas.Series object with the column of real data
(required) synthetic_data
: A pandas.Series object with the column of synthetic data