# ❖ DPGCSynthesizer

{% hint style="info" %}
❖ **SDV Enterprise Bundle**. This feature is available as part of the **Differential Privacy Bundle**, an optional add-on to SDV Enterprise. For more information, please visit the [Differential Privacy Bundle](https://docs.sdv.dev/sdv/explore/sdv-bundles/differential-privacy) page.
{% endhint %}

The DPGCSynthesizer creates synthetic data that is differential private. It is based on the classical statistical methods from the Gaussian Copula synthesizer with added privacy guarantees. DPGC stands for ***Differential Privacy for Gaussian Copula***. For more information about the algorithm, refer to the [research paper](https://openproceedings.org/2014/conf/edbt/LiXJ14.pdf).

```python
from sdv.single_table import DPGCSynthesizer

synthesizer = DPGCSynthesizer(metadata, epsilon=2.5)
synthesizer.fit(data)

synthetic_data = synthesizer.sample(num_rows=10)
```

## Creating a synthesizer

When creating your synthesizer, you are required to pass in:

* A [Metadata](https://docs.sdv.dev/sdv/single-table-data/data-preparation/creating-metadata) object as the first argument
* An **`epsilon`** value as the second argument. This is a float (>0) that represents the *privacy loss budget* you're willing to accommodate. (See the parameter reference below for more information.)

All other parameters are optional. You can include them to customize the synthesizer.

```python
synthesizer = DPGCSynthesizer(
  metadata,
  epsilon=2.5, # we recommend values in the 0-10 range; 0-1 is the most conservative
  known_min_max_values={
    'age': {'min': 0, 'max': 120 },
    'salary': { 'min': 0 }
  },
  enforce_rounding=True,
  locales=['en_US'],
)
```

### Parameter Reference

(required) **`epsilon`**: A float >0 that represents the privacy loss budget you are willing to accommodate.&#x20;

{% hint style="info" %}
**How should I chose my privacy loss budget (epsilon)?** The value of epsilon is a measure of how much risk you're willing to take on when it comes to privacy.

* Values in the 0-1 range indicate that you are not willing to take on too much risk. As a result, the synthetic data will have strong privacy guarantees — potentially at the expense of data quality.
* Values in the 2-10 range indicate that you're willing to accept some privacy risk in order to preserve more data quality.

Note: The smaller your epsilon value, the more data the synthesizer will require to fully enforce differential privacy. The exact size of data required also depends on the # of columns in your dataset. For reference, a dataset with 14 columns will require at least 15K rows for an epsilon of 2.5.
{% endhint %}

**`known_min_max_values`**: A dictionary that provides the already-known min/max values for any of the numerical or datetime columns. Providing these values will help to conserve the privacy budget and ultimately yield higher quality synthetic data (for the same epsilon value).

{% hint style="danger" %}
**The min/max values should represent prior knowledge of the data.** In order to enforce differential privacy, it is critical that these min/max values are prior knowledge that is *not* based on any computations of the real data.
{% endhint %}

<table data-header-hidden><thead><tr><th width="179"></th><th></th></tr></thead><tbody><tr><td>(default) <code>None</code></td><td>There are no known min/max values. The synthesizer will use up some of your privacy loss budget to compute differentially-private min/max values.</td></tr><tr><td><code>&#x3C;dictionary></code></td><td><p>A dictionary with the known min/max values. The keys are the column names, and the value is another dictionary containing <code>'min'</code> and <code>'max'</code> keys. (You can provide one or both.)</p><p></p><p>For numerical columns, represent the min/max values as floats; for datetimes, represent them as pd.Timestamp objects.</p><pre class="language-python"><code class="lang-python">{
  'age': { 'min': 0, 'max': 120 },
  'salary': { 'min': 0 }
}
</code></pre></td></tr></tbody></table>

**`enforce_rounding`**: Control whether the synthetic data should have the same number of decimal digits as the real data

<table data-header-hidden><thead><tr><th width="179"></th><th></th></tr></thead><tbody><tr><td>(default) <code>True</code></td><td>The synthetic data will be rounded to the same number of decimal digits that were observed in the real data</td></tr><tr><td><code>False</code></td><td>The synthetic data may contain more decimal digits than were observed in the real data</td></tr></tbody></table>

**`locales`**: A list of locale strings. Any PII columns will correspond to the locales that you provide.

<table data-header-hidden><thead><tr><th width="218"></th><th></th></tr></thead><tbody><tr><td>(default) <code>['en_US']</code></td><td>Generate PII values in English corresponding to US-based concepts (eg. addresses, phone numbers, etc.)</td></tr><tr><td><code>&#x3C;list></code></td><td><p>Create data from the list of locales. Each locale string consists of a 2-character code for the language and 2-character code for the country, separated by an underscore.</p><p></p><p>For example <code>["en_US", "fr_CA"]</code></p><p>For all options, see the <a href="https://faker.readthedocs.io/en/master/locales.html">Faker docs</a>.</p></td></tr></tbody></table>

### get\_parameters

Use this function to access the all parameters your synthesizer uses -- those you have provided as well as the default ones.

**Parameters** None

**Output** A dictionary with the parameter names and the values

```python
synthesizer.get_parameters()
```

```python
{
    'epsilon': 2.5,
    'known_min_max_values': {
        'age': { 'min': 0, 'max': 120 },
        'salary': { 'min': 0 }
    }
    'enforce_rounding': False
}
```

{% hint style="info" %}
The returned parameters are a copy. Changing them will not affect the synthesizer.
{% endhint %}

### get\_metadata

Use this function to access the metadata object that you have included for the synthesizer

**Parameters** None

**Output** A [Metadata](https://docs.sdv.dev/sdv/concepts/metadata) object

```python
metadata = synthesizer.get_metadata()
```

{% hint style="info" %}
The returned metadata is a copy. Changing it will not affect the synthesizer.
{% endhint %}

## Learning from your data

To learn a machine learning model based on your real data, use the `fit` method.

### fit

**Parameters**

* (required) `data`: A [pandas DataFrame](https://pandas.pydata.org/docs/reference/api/pandas.DataFrame.html) object containing the real data that the machine learning model will learn from

**Output** (None)

```python
synthesizer.fit(data)
```

{% hint style="info" %}
**Technical Details:** This synthesizer uses the Gaussian Copula methodology, but with modifications to ensure differential privacy.  For more information about the algorithm, please refer to [this research paper](https://openproceedings.org/2014/conf/edbt/LiXJ14.pdf).
{% endhint %}

## Saving your synthesizer

Save your trained synthesizer for future use.

### save

Use this function to save your trained synthesizer as a Python pickle file.

**Parameters**

* (required) `filepath`: A string describing the filepath where you want to save your synthesizer. Make sure this ends in `.pkl`&#x20;

**Output** (None) The file will be saved at the desired location

```python
synthesizer.save(
    filepath='my_synthesizer.pkl'
)
```

### load (utility function)

Use this utility function to load a trained synthesizer from a Python pickle file. After loading your synthesizer, you'll be able to sample synthetic data from it.

**Parameters**

* (required) `filepath`: A string describing the filepath of your saved synthesizer

**Output** Your synthesizer object

```python
from sdv.utils import load_synthesizer

synthesizer = load_synthesizer(
    filepath='my_synthesizer.pkl'
)
```

*This utility function works for any SDV synthesizer.*

## What's next?

{% hint style="success" %}
**Get the SDVerified stamp of approval.** Run the [differential privacy verification](https://docs.sdv.dev/sdv/single-table-data/evaluation/privacy/empirical-differential-privacy) on your synthesizer. Verify the results before you decide to sample any synthetic data or share your synthesizer.
{% endhint %}

After training your synthesizer, you can now sample synthetic data. See the [Sampling](https://docs.sdv.dev/sdv/single-table-data/sampling) section for more details.

```python
synthetic_data = synthesizer.sample(num_rows=10)
```

## FAQs

<details>

<summary>What happens if columns don't contain numerical data?</summary>

This synthesizer models non-numerical columns, including columns with missing values.

Although the Gaussian Copula algorithm is designed for only numerical data, this synthesizer converts other data types using Reversible Data Transforms (RDTs).&#x20;

</details>


---

# Agent Instructions: Querying This Documentation

If you need additional information that is not directly available in this page, you can query the documentation dynamically by asking a question.

Perform an HTTP GET request on the current page URL with the `ask` query parameter:

```
GET https://docs.sdv.dev/sdv/single-table-data/modeling/synthesizers/dpgcsynthesizer.md?ask=<question>
```

The question should be specific, self-contained, and written in natural language.
The response will contain a direct answer to the question and relevant excerpts and sources from the documentation.

Use this mechanism when the answer is not explicitly present in the current page, you need clarification or additional context, or you want to retrieve related documentation sections.
