＊ SmoothnessSimilarity

＊SDV Enterprise Feature. This feature is available to our licensed users and is not currently in our public library. To learn more about the SDV Enterprise and its extra features, visit our website.

This metric measures whether the synthetic data is as smooth as the real data. It evaluates a failure mode where the synthetic data is too smooth compared to the real data.

Data Compatibility

Numerical : This metric is meant for numerical data
Datetime : This metric converts datetime values into numerical values

This metric ignores missing values.

Score

(best) 1.0: The smoothness between the real data and synthetic data are exactly the same

(worst) 0.0: The smoothness between the real and synthetic data are as different as can be

How does it work?

As a proxy for smoothness, this metric computes the full width at half maximum (FWHM) [1] for both the real and synthetic data. The larger the FWHM value, the more smooth we can consider the data, as shown below

This metric normalizes the differences in FWHM and inverts the score so that 1 means the values are similar (best possible score). In mathematical terms:

score = 1 - \frac{|\text{FWHM}(R) - \text{FWHM}(S)|}{\text{max(FWHM(}R), \text{FWHM}(S))}

where R is the real distribution and S is the synthetic distribution.

Usage

To apply this metric, access the single_column module and use the compute method.

from sdmetrics.single_column import SmoothnessSimilarity

SmoothnessSimilarity.compute(
    real_data=real_table['column_name'],
    synthetic_data=synthetic_table['column_name']
)

Parameters

(required) real_data: A pandas.Series object with the column of real data
(required) synthetic_data: A pandas.Series object with the column of synthetic data

References

[1] Full width at half maximum

PreviousPrivacy Against Inference NextCitation

Last updated 1 year ago