Privacy Metrics
Privacy metrics broadly capture the safety that synthetic data can provide to you, especially in cases where you'd like to disclose the synthetic data rather than real data. It's important to note that safety can be defined in many ways, depending on what type of information is valuable to protect and the assumptions about how it may be leaked.
Privacy metrics are based on statistical patterns across your real and synthetic datasets. You can decide whether divulging the pattern (via synthetic data) is worth it for your project.
Browse
Apply these metrics to evaluate the privacy of your data tables:
DisclosureProtection, DisclosureProtectionEstimate: Measure the risk of disclosing sensitive information about specific, sensitive columns in your dataset.
[Coming soon!] DCROverfittingProtection: Measure the distance between the real and synthetic data, ensuring that your synthetic data doesn't too closely match the real data (Overfitting refers to your synthesizer being overfit on the real data.)
[Coming soon!] DCRBaselineProtection: Measure the distance between the real and synthetic data, comparing it against random data as a baseline. (Random data provides the highest privacy.)
Last updated