# ＊ HSASynthesizer

{% hint style="info" %}
**＊SDV Enterprise Feature.** This feature is only available for licensed, enterprise users. For more information, visit our page to [Compare SDV Features](https://docs.sdv.dev/sdv/explore/sdv-enterprise/compare-features).
{% endhint %}

The HSA Synthesizer uses a segment-based algorithm to learn from your real data and generate synthetic data. This synthesizer offers fast performance for unlimited tables.

```python
from sdv.multi_table import HSASynthesizer

synthesizer = HSASynthesizer(metadata)
synthesizer.fit(data)

synthetic_data = synthesizer.sample()
```

## Creating a synthesizer

When creating your synthesizer, you are required to pass in a [Metadata](https://docs.sdv.dev/sdv/multi-table-data/data-preparation/creating-metadata) object as the first argument.

```python
synthesizer = HSASynthesizer(metadata)
```

All other parameters are optional. You can include them to customize the synthesizer.

### Parameter Reference

**`locales`**: A list of locale strings. Any PII columns will correspond to the locales that you provide.

<table data-header-hidden><thead><tr><th width="218"></th><th></th></tr></thead><tbody><tr><td>(default) <code>['en_US']</code></td><td>Generate PII values in English corresponding to US-based concepts (eg. addresses, phone numbers, etc.)</td></tr><tr><td><code>&#x3C;list></code></td><td><p>Create data from the list of locales. Each locale string consists of a 2-character code for the language and 2-character code for the country, separated by an underscore.</p><p></p><p>For example <code>[</code><a href="https://faker.readthedocs.io/en/master/locales/en_US.html"><code>"en_US"</code></a><code>,</code> <a href="https://faker.readthedocs.io/en/master/locales/fr_CA.html"><code>"fr_CA"</code></a><code>]</code>. </p><p>For all options, see the <a href="https://faker.readthedocs.io/en/master/locales.html">Faker docs</a>.</p></td></tr></tbody></table>

```python
synthesizer = HSASynthesizer(
    metadata,
    locales=['en_US', 'en_CA', 'fr_CA']
)
```

**`default_num_clusters`**: The number of clusters to segment each table into

<table data-header-hidden><thead><tr><th width="218"></th><th></th></tr></thead><tbody><tr><td>(default) 3</td><td>Split each table into 3 clusters for the purposes of capturing correlations between parent and child tables</td></tr><tr><td><code>&#x3C;integer></code></td><td>Split each table into the desired number of clusters. A smaller number of clusters makes the fit and sampling more efficient, and generally noises the data. A large number of clusters allows the model to learn more specific correlations across parent/child tables.</td></tr></tbody></table>

### set\_table\_parameters

The HSA Synthesizer is a multi-table algorithm that models each individual table as well as the connections between them. You can get and set the parameters for each individual table.

**Parameters**

* (required) `table_name`: A string describing the name of the table
* `table_synthesizer`: The single table synthesizer to use for modeling the table
  * (default) `'GaussianCopulaSynthesizer'`: Use the [GaussianCopulaSynthesizer](https://docs.sdv.dev/sdv/single-table-data/modeling/synthesizers/gaussiancopulasynthesizer) to model the single table
  * Other available options: `'GaussianCopulaSynthesizer'`, `'CTGANSynthesizer'`, `'TVAESynthesizer'`, `'CopulaGANSynthesizer'`. For more information, see [Single Table Synthesizers](https://docs.sdv.dev/sdv/single-table-data/modeling/synthesizers).
* `table_parameters`: A dictionary mapping the name of the parameter (string) to the value of the parameter (various). These parameters are different for each synthesizer. For more information, see [Single Table Synthesizers](https://docs.sdv.dev/sdv/single-table-data/modeling/synthesizers).

**Output** (None)

```python
synthesizer.set_table_parameters(
    table_name='guests',
    table_synthesizer='GaussianCopulaSynthesizer',
    table_parameters={
        'enforce_min_max_values': True,
        'default_distribution': 'truncnorm',
        'numerical_distributions': { 
            'checkin_date': 'uniform',
            'amenities_fee': 'beta' 
        }
    }
)
```

### get\_parameters

Use this function to access the all parameters your synthesizer uses -- those you have provided as well as the default ones.

**Parameters** (None)

**Output** A dictionary with the table names and parameters for each table.

{% hint style="info" %}
These parameters are only for the multi-table synthesizer. To get individual table-level parameters, use the `get_table_parameters` function.

The returned parameters are a copy. Changing them will not affect the synthesizer.
{% endhint %}

```python
synthesizer.get_parameters()
```

```python
{
    'locales': ['en_US', 'fr_CA'],
    ...
}
```

### get\_table\_parameters

Use this function to access the all parameters a table synthesizer uses -- those you have provided as well as the default ones.

**Parameters**

* (required) `table_name`: A string describing the name of the table

**Output** A dictionary with the parameter names and the values

```python
synthesizer.get_table_parameters(table_name='users')
```

```python
{
    'synthesizer_name': 'GaussianCopulaSynthesizer',
    'synthesizer_parameters': {
        'default_distribution': 'beta',
        ...
    }
}
```

### get\_metadata

Use this function to access the metadata object that you have included for the synthesizer

**Parameters** None

**Output** A [Metadata](https://docs.sdv.dev/sdv/multi-table-data/data-preparation/creating-metadata) object

```python
metadata = synthesizer.get_metadata()
```

{% hint style="info" %}
The returned metadata is a copy. Changing it will not affect the synthesizer.
{% endhint %}

## Learning from your data

To learn a machine learning model based on your real data, use the `fit` method.

### fit

**Parameters**

* (required) `data`: A dictionary mapping each table name to a [pandas DataFrame](https://pandas.pydata.org/docs/reference/api/pandas.DataFrame.html) containing the real data that the machine learning model will learn from

**Output** (None)

{% hint style="info" %}
**Technical Details:** HSA, which stands for *Hierarchical Segmentation Algorithm*, uses a segment-based approach to model the parent-child relationships of a multi-table datasets. At a base level, it can model individual tables using any [single table synthesizer](https://docs.sdv.dev/sdv/single-table-data/modeling/synthesizers).&#x20;
{% endhint %}

### get\_learned\_distributions

After fitting this synthesizer, you can access the marginal distributions that were learned to estimate the shape of each column.

**Parameters**

* (required) `table_name`: A string with the name of the table

**Output** A dictionary that maps the name of each learned column to the distribution that estimates its shape

```python
synthesizer.get_learned_distributions(table_name='guests')
```

```python
{
    'amenities_fee': {
        'distribution': 'beta',
        'learned_parameters': { 'a': 2.22, 'b': 3.17, 'loc': 0.07, 'scale': 48.5 }
    },
    'checkin_date': { 
        ...
    },
    ...
}
```

For more information about the distributions and their parameters, visit the[ Copulas library](https://sdv.dev/Copulas/).

{% hint style="info" %}
Learned parameters are only available for parametric models and distributions. For eg. you will not be able to access learned distributions for GAN-based synthesizers (such as CTGAN) or the `'gaussian_kde'` technique.

In some cases, the synthesizer may not be able to fit the exact distribution shape you requested, so you may see another distribution shape (eg. `'truncnorm'` instead of `'beta'`).
{% endhint %}

### get\_loss\_values

After fitting, you can access the loss values computed during each epoch for both the numerator and denominator.

**Parameters**

* (required) `table_name`: A string with the name of the table

**Output** A pandas.DataFrame object containing epoch number, generator loss value and discriminator loss value.

```python
synthesizer.get_loss_values(table_name='users')
```

```python
Epoch  Generator Loss  Discriminator Loss
1      1.7863          -0.3639
2      1.5484          0.2260
3      1.3633          -0.0441
...
```

{% hint style="info" %}
Loss values are only available for tables that use neural network-based models. such as CTGAN, TVAE or CopulaGAN.
{% endhint %}

## Saving your synthesizer

Save your trained synthesizer for future use.

### save

Use this function to save your trained synthesizer as a Python pickle file.

**Parameters**

* (required) `filepath`: A string describing the filepath where you want to save your synthesizer. Make sure this ends in `.pkl`

**Output** (None) The file will be saved at the desired location

```python
synthesizer.save(
    filepath='my_synthesizer.pkl'
)
```

### load (utility function)

Use this utility function to load a trained synthesizer from a Python pickle file. After loading your synthesizer, you'll be able to sample synthetic data from it.

**Parameters**

* (required) `filepath`: A string describing the filepath of your saved synthesizer

**Output** Your synthesizer object

```python
from sdv.utils import load_synthesizer

synthesizer = load_synthesizer(
    filepath='my_synthesizer.pkl'
)
```

*This utility function works for any SDV synthesizer.*

## What's next?

After training your synthesizer, you can now sample synthetic data. See the [Sampling](https://docs.sdv.dev/sdv/multi-table-data/sampling) section for more details.

```python
synthetic_data = synthesizer.sample(scale=1.0)
```

{% hint style="info" %}
**Want to improve your synthesizer?** Input logical rules in the form of constraints, and customize the transformations used for pre- and post-processing the data.

For more details, see [Advanced Features](https://docs.sdv.dev/sdv/multi-table-data/modeling/customizations).
{% endhint %}

## FAQs

<details>

<summary>What happens if the columns don't contain numerical data?</summary>

This synthesizer models non-numerical columns, including columns with missing values.

Although the HSA algorithm is designed for only numerical data, this synthesizer converts other data types using Reversible Data Transforms (RDTs). To access and modify the transformations, see [Advanced Features](https://docs.sdv.dev/sdv/multi-table-data/modeling/customizations).

</details>


---

# Agent Instructions: Querying This Documentation

If you need additional information that is not directly available in this page, you can query the documentation dynamically by asking a question.

Perform an HTTP GET request on the current page URL with the `ask` query parameter:

```
GET https://docs.sdv.dev/sdv/multi-table-data/modeling/synthesizers/hsasynthesizer.md?ask=<question>
```

The question should be specific, self-contained, and written in natural language.
The response will contain a direct answer to the question and relevant excerpts and sources from the documentation.

Use this mechanism when the answer is not explicitly present in the current page, you need clarification or additional context, or you want to retrieve related documentation sections.
