＊SDV Enterprise Feature. This feature is available to our licensed users and is not currently in our public library. To learn more about the SDV Enterprise and its extra features, get in touch with us.
This metric measures whether the synthetic data is as smooth as the real data. It evaluates a failure mode where the synthetic data is too smooth compared to the real data.
- Numerical : This metric is meant for numerical data
- Datetime : This metric converts datetime values into numerical values
This metric ignores missing values.
(best) 1.0: The smoothness between the real data and synthetic data are exactly the same
(worst) 0.0: The smoothness between the real and synthetic data are as different as can be
As a proxy for smoothness, this metric computes the full width at half maximum (FWHM)  for both the real and synthetic data. The larger the FWHM value, the more smooth we can consider the data, as shown below
An illustration of the FWHM for a real distribution (R, denoted in black) versus a synthetic distribution (S, denoted in green). The synthetic data has a larger FWHM, so we can consider it to be a smoother distribution than the real data.
This metric normalizes the differences in FWHM and inverts the score so that 1 means the values are similar (best possible score). In mathematical terms:
where R is the real distribution and S is the synthetic distribution.
To apply this metric, access the
single_columnmodule and use the
from sdmetrics.single_column import SmoothnessSimilarity
real_data: A pandas.Series object with the column of real data
synthetic_data: A pandas.Series object with the column of synthetic data