# PARSynthesizer

The `PARSynthesizer` uses a deep learning method to train a model and generate synthetic data.

```python
from sdv.sequential import PARSynthesizer

synthesizer = PARSynthesizer(metadata)
synthesizer.fit(data)

synthetic_data = synthesizer.sample(num_sequences=100)
```

{% hint style="warning" %}
**Is the PARSynthesizer suited for your dataset?** The PARSynthesizer is designed to work on **multi-sequence data**, which means that there are multiple sequences (usually belonging to different entities) present within the same dataset. This means that your metadata should include a `sequence_key`. Using this information, the PARSynthesizer creates brand new entities and brand new sequences for each one.

If your dataset contains only a single sequence of data, then the PARSynthesizer is not suited for your dataset.
{% endhint %}

## Creating a synthesizer

When creating your synthesizer, you are required to pass in a [Metadata](https://docs.sdv.dev/sdv/concepts/metadata) object as the first argument. All other parameters are optional. You can include them to customize the synthesizer.

```python
synthesizer = PARSynthesizer(
    metadata, # required
    enforce_min_max_values=True,
    enforce_rounding=False,
    context_columns=['Address', 'Smoker']
)
```

### Parameter Reference

**`enforce_min_max_values`**: Control whether the synthetic data should adhere to the same min/max boundaries set by the real data

<table data-header-hidden><thead><tr><th width="179"></th><th></th></tr></thead><tbody><tr><td>(default) <code>True</code></td><td>The synthetic data will contain numerical values that are within the ranges of the real data.</td></tr><tr><td><code>False</code></td><td>The synthetic data may contain numerical values that are less than or greater than the real data. Note that you can still set the limits on individual columns using <a href="../../concepts/constraint-augmented-generation-cag/predefined-constraints">Constraints</a>.</td></tr></tbody></table>

**`enforce_rounding`**: Control whether the synthetic data should have the same number of decimal digits as the real data

<table data-header-hidden><thead><tr><th width="179"></th><th></th></tr></thead><tbody><tr><td>(default) <code>True</code></td><td>The synthetic data will be rounded to the same number of decimal digits that were observed in the real data</td></tr><tr><td><code>False</code></td><td>The synthetic data may contain more decimal digits than were observed in the real data</td></tr></tbody></table>

**`locales`**: A list of locale strings. Any PII columns will correspond to the locales that you provide.

<table data-header-hidden><thead><tr><th width="218"></th><th></th></tr></thead><tbody><tr><td>(default) <code>['en_US']</code></td><td>Generate PII values in English corresponding to US-based concepts (eg. addresses, phone numbers, etc.)</td></tr><tr><td><code>&#x3C;list></code></td><td><p>Create data from the list of locales. Each locale string consists of a 2-character code for the language and 2-character code for the country, separated by an underscore.</p><p></p><p>For example <code>[</code><a href="https://faker.readthedocs.io/en/master/locales/en_US.html"><code>"en_US"</code></a><code>,</code> <a href="https://faker.readthedocs.io/en/master/locales/fr_CA.html"><code>"fr_CA"</code></a><code>]</code>. </p><p>For all options, see the <a href="https://faker.readthedocs.io/en/master/locales.html">Faker docs</a>.</p></td></tr></tbody></table>

**`context_columns`**: Provide a list of strings that represent the names of the context columns. Context columns do not vary inside of a sequence. For example, a user's `'Address'` may not vary within a sequence while other columns such as `'Heart Rate'` would. Defaults to an empty list.

**`epochs`**: Number of times to train the GAN. Each new epoch can improve the model.

<table data-header-hidden><thead><tr><th width="179"></th><th></th></tr></thead><tbody><tr><td>(default) <code>128</code></td><td>Run all the data through the neural network 128 times during training</td></tr><tr><td><code>&#x3C;number></code></td><td>Train for a different number of epochs. Note that larger numbers will increase the modeling time.</td></tr></tbody></table>

**`verbose`**: Control whether to print out the results of each epoch. You can use this to track the training time as well as the improvements per epoch.

<table data-header-hidden><thead><tr><th width="179"></th><th></th></tr></thead><tbody><tr><td>(default) <code>False</code></td><td>Do not print out any results</td></tr><tr><td><code>True</code></td><td>Print out the loss value per epoch. The loss values indicate how well the neural network is currently performing, lower values indicating higher quality.</td></tr></tbody></table>

**`cuda`**: Whether to enable GPU usage when training the synthesizer. This may speed up the modeling time.

<table data-header-hidden><thead><tr><th width="179"></th><th></th></tr></thead><tbody><tr><td>(default) <code>True</code></td><td>If available, use the GPU to speed up modeling time. Currently, this will look for <a href="https://developer.nvidia.com/how-to-cuda-python">CUDA</a>, which is available on Linux/Windows machines. If this is not available, then the GPU will not be used.  </td></tr><tr><td><code>False</code></td><td>Do not use the GPU to speed up modeling time.</td></tr></tbody></table>

{% hint style="info" %}
**Is my synthesizer using the GPU?** After calling `fit`, you may notice that the GPU is not immediately used. This is because GPU is not used for the initial, data preprocessing step. After data preprocessing is complete, you should then see the GPU being used for the neural network training. For more information about data preprocessing, see the [this guide](https://docs.sdv.dev/sdv/single-table-data/modeling/customizations/preprocessing).
{% endhint %}

*(deprecated) `cuda`: Please use the `enable_gpu` option to use CUDA, if it's available on your platform.*

{% hint style="info" %}
**Looking for more customizations?** Other settings are available to fine-tune the architecture of the underlying neural network used to model the data. Click the section below to expand.
{% endhint %}

<details>

<summary>Click to expand additional neural network customization options</summary>

These settings are specific to the neural network. Use these settings if you want to optimize the technical architecture and modeling.

**`sample_size`**: The number of times to sample (before choosing and returning the sample which maximizes the likelihood). Defaults to `1`.

**`segment_size`**: Cut each training sequence into several segments by using this parameter. For example, if the `segment_size=10` then each segment contains 10 data points. Defaults to `None`, which means the sequences are not cut into any segments.

</details>

### get\_parameters

Use this function to access the custom parameters you have included for the synthesizer

**Parameters** None

**Output** A dictionary with the parameter names and the values

```python
synthesizer.get_parameters()
```

```python
{
    'enforce_min_max_values': True,
    'enforce_rounding': False
    'context_columns': ['Address', 'Smoker']
}
```

{% hint style="info" %}
The returned parameters are a copy. Changing them will not affect the synthesizer.
{% endhint %}

### get\_metadata

Use this function to access the metadata object that you have included for the synthesizer

**Parameters** None

**Output** A [Metadata](https://docs.sdv.dev/sdv/concepts/metadata) object

```python
metadata = synthesizer.get_metadata()
```

{% hint style="info" %}
The returned metadata is a copy. Changing it will not affect the synthesizer.
{% endhint %}

## Learning from your data

To learn a machine learning model based on your real data, use the `fit` method.

### fit

**Parameters**

* (required) `data`: A [pandas DataFrame](https://pandas.pydata.org/docs/reference/api/pandas.DataFrame.html) object containing the real data that the machine learning model will learn from

**Output** (None)

```python
synthesizer.fit(data)
```

{% hint style="info" %}
**Technical Details**: PAR is a Probabilistic Auto-Regressive model that is based in neural networks. It learns how to create brand new sequences of multi-dimensional data, by conditioning on the unchanging, context values.&#x20;

For more details, see the [Sequential Models in the Synthetic Data Vault](https://arxiv.org/pdf/2207.14406.pdf), a preprint from June 2022 that describes the PAR model.
{% endhint %}

## Saving your synthesizer

Save your trained synthesizer for future use.

### save

Use this function to save your trained synthesizer as a Python pickle file.

**Parameters**

* (required) `filepath`: A string describing the filepath where you want to save your synthesizer. Make sure this ends in `.pkl`

**Output** (None) The file will be saved at the desired location

```python
synthesizer.save(
    filepath='my_synthesizer.pkl'
)
```

### load (utility function)

Use this utility function to load a trained synthesizer from a Python pickle file. After loading your synthesizer, you'll be able to sample synthetic data from it.

**Parameters**

* (required) `filepath`: A string describing the filepath of your saved synthesizer

**Output** Your synthesizer object

```python
from sdv.utils import load_synthesizer

synthesizer = load_synthesizer(
    filepath='my_synthesizer.pkl'
)
```

*This utility function works for any SDV synthesizer.*

## What's next?

After training your synthesizer, you can now sample synthetic data. See the [Sampling](https://docs.sdv.dev/sdv/sequential-data/sampling) section for more details.

{% hint style="info" %}
**Want to improve your synthesizer?** Customize the transformations used for pre- and post-processing the data. For more details, see [Advanced Features](https://docs.sdv.dev/sdv/sequential-data/modeling/customizations).
{% endhint %}

## FAQs

<details>

<summary>How do I cite PAR?</summary>

*Kevin Zhang, Kalyan Veeramachaneni, Neha Patki.* **Sequential Models in the Synthetic Data Vault.** Preprint, June 2022.

```
@unpublished{par,
   title={Sequential Models in the Synthetic Data Vault},
   author={Zhang, Kevin and Veeramachaneni, Kalyan and Patki, Neha},
   year={2022}
}
```

</details>

<details>

<summary>What happens if columns don't contain numerical data?</summary>

This synthesizer models non-numerical columns, including columns with missing values.

Although the Gaussian Copula algorithm is designed for only numerical data, this synthesizer converts other data types using Reversible Data Transforms (RDTs). To access and modify the transformations, see [Advanced Features](https://docs.sdv.dev/sdv/sequential-data/modeling/customizations).

</details>

<details>

<summary>Can I call <code>fit</code> again even if I've previously fit some data?</summary>

Yes, even if you're previously fit data, you should be able to call the `fit` method again.

If you do this, the synthesizer will **start over from scratch** and fit the new data that you provide it. This is the equivalent of creating a new synthesizer and fitting it with new data.

</details>
