Synthetic Data Vault
GitHubSlackDataCebo
  • Welcome to the SDV!
  • Tutorials
  • Explore SDV
    • SDV Community
    • SDV Enterprise
      • ⭐Compare Features
    • SDV Bundles
      • ❖ AI Connectors
      • ❖ CAG
      • ❖ Differential Privacy
      • ❖ XSynthesizers
  • Single Table Data
    • Data Preparation
      • Loading Data
      • Creating Metadata
    • Modeling
      • Synthesizers
        • GaussianCopulaSynthesizer
        • CTGANSynthesizer
        • TVAESynthesizer
        • ❖ XGCSynthesizer
        • ❖ BootstrapSynthesizer
        • ❖ SegmentSynthesizer
        • * DayZSynthesizer
        • ❖ DPGCSynthesizer
        • ❖ DPGCFlexSynthesizer
        • CopulaGANSynthesizer
      • Customizations
        • Constraints
        • Preprocessing
    • Sampling
      • Sample Realistic Data
      • Conditional Sampling
    • Evaluation
      • Diagnostic
      • Data Quality
      • Visualization
      • Privacy
        • Empirical Differential Privacy
        • SDMetrics: Privacy Metrics
  • Multi Table Data
    • Data Preparation
      • Loading Data
        • Demo Data
        • CSV
        • Excel
        • ❖ AlloyDB
        • ❖ BigQuery
        • ❖ MSSQL
        • ❖ Oracle
        • ❖ Spanner
      • Cleaning Your Data
      • Creating Metadata
    • Modeling
      • Synthesizers
        • * DayZSynthesizer
        • * IndependentSynthesizer
        • HMASynthesizer
        • * HSASynthesizer
      • Customizations
        • Constraints
        • Preprocessing
      • * Performance Estimates
    • Sampling
    • Evaluation
      • Diagnostic
      • Data Quality
      • Visualization
  • Sequential Data
    • Data Preparation
      • Loading Data
      • Cleaning Your Data
      • Creating Metadata
    • Modeling
      • PARSynthesizer
      • Customizations
    • Sampling
      • Sample Realistic Data
      • Conditional Sampling
    • Evaluation
  • Concepts
    • Metadata
      • Sdtypes
      • Metadata API
      • Metadata JSON
    • Constraint-Augmented Generation (CAG)
      • Predefined Constraints
        • FixedCombinations
        • FixedIncrements
        • Inequality
        • OneHotEncoding
        • Range
        • ❖ CarryOverColumns
        • * ChainedInequality
        • ❖ CompositeKeys
        • ❖ FixedNullCombinations
        • ❖ ForeignToForeignKey
        • ❖ ForeignToPrimaryKeySubset
        • ❖ MixedScales
        • ❖ PrimaryToPrimaryKey
        • ❖ PrimaryToPrimaryKeySubset
        • ❖ ReferenceTable
        • ❖ SelfReferentialHierarchy
        • ❖ UniqueBridgeTable
      • Program Your Own Constraint
      • Constraints API
  • Support
    • Troubleshooting
      • Help with Installation
      • Help with SDV
    • Versioning & Backwards Compatibility Policy
Powered by GitBook

Copyright (c) 2023, DataCebo, Inc.

On this page
  • Included Features
  • Installation
  1. Explore SDV
  2. SDV Bundles

❖ Differential Privacy

Previous❖ CAGNext❖ XSynthesizers

Last updated 14 days ago

The Differential Privacy bundle allows you to create synthetic data that is private, according to methods that are backed by mathematically-rigorous findings. The differential privacy framework enforces a limit on how much one individual record can affect the synthesizer — and ultimately leak into the synthetic data.

Share your synthetic data broadly. Our differential privacy synthesizers guarantee that a single row of data will not unduly affect the patterns that the synthesizer learns. We use ε-differential privacy, which allows you to provide a privacy loss budget, ε (epsilon). This budget allows you to control the privacy/quality tradeoffs.

Upscale your synthetic data. Once you've fit your synthesizer, use it to create any size of differentially-private synthetic data — even 10x or 100x the original size. Privacy guarantees apply to all data your synthesizer creates.

Included Features

Synthesizers for generating differentially-private data.

  • The DPGCSynthesizer creates differentially private data using the GaussianCopula method

  • The experimental DPGCFlexSynthesizer runs a similar method, but offers more flexibility in the data pre-processing that you can use

Preprocessing methods for generating differentially private columns.

Under-the-hood, the synthesizers use preprocessing techniques for generating differentially private columns of data. You can apply these transformers in a standalone way.

  • Noise the column using differential privacy: DPLaplaceNoiser, DPTimestampLaplaceNoiser, DPResponseRandomizer, DPWeightedResponseRandomizer

  • Normalize the column into numerical data of a specific shape, using differential privacy: DPECDFNormalizer, DPDiscreteECDFNormalizer

Verify the differential privacy.

Use the differential privacy evaluation tool to empirically measure the differential privacy of a synthesizer algorithm on a given dataset. Use this with any SDV single-table synthesizer.

Installation

Purchase the Differential Privacy bundle and install it separately.

% pip install -U bundle-differential-privacy --index-url https://pypi.datacebo.com

This command prompts you for your SDV Enterprise credentials.

Save and share your synthesizer. Save and load in your synthesizer to sample more synthetic data at any time. No real data or sensitive statistics are saved in the synthesizer, so you can share it without worry.

💾
📊
⭐
The differential privacy framework enforces a limit on how much 1 individual record (row of data, outlined in orange) can affect what the synthesizer learns (dots, colored in orange).