❖ Differential Privacy
Last updated
Last updated
The Differential Privacy bundle allows you to create synthetic data that is private, according to methods that are backed by mathematically-rigorous findings. The differential privacy framework enforces a limit on how much one individual record can affect the synthesizer — and ultimately leak into the synthetic data.
Share your synthetic data broadly. Our differential privacy synthesizers guarantee that a single row of data will not unduly affect the patterns that the synthesizer learns. We use ε-differential privacy, which allows you to provide a privacy loss budget, ε (epsilon). This budget allows you to control the privacy/quality tradeoffs.
Upscale your synthetic data. Once you've fit your synthesizer, use it to create any size of differentially-private synthetic data — even 10x or 100x the original size. Privacy guarantees apply to all data your synthesizer creates.
Synthesizers for generating differentially-private data.
The DPGCSynthesizer creates differentially private data using the GaussianCopula method
The experimental DPGCFlexSynthesizer runs a similar method, but offers more flexibility in the data pre-processing that you can use
Preprocessing methods for generating differentially private columns.
Under-the-hood, the synthesizers use preprocessing techniques for generating differentially private columns of data. You can apply these transformers in a standalone way.
Noise the column using differential privacy: DPLaplaceNoiser, DPTimestampLaplaceNoiser, DPResponseRandomizer, DPWeightedResponseRandomizer
Normalize the column into numerical data of a specific shape, using differential privacy: DPECDFNormalizer, DPDiscreteECDFNormalizer
Verify the differential privacy.
Use the differential privacy evaluation tool to empirically measure the differential privacy of a synthesizer algorithm on a given dataset. Use this with any SDV single-table synthesizer.
Purchase the Differential Privacy bundle and install it separately.
This command prompts you for your SDV Enterprise credentials.
Save and share your synthesizer. Save and load in your synthesizer to sample more synthetic data at any time. No real data or sensitive statistics are saved in the synthesizer, so you can share it without worry.