# ❖ SegmentSynthesizer

{% hint style="info" %}
❖ **SDV Enterprise Bundle**. This feature is available as part of the **XSynthesizers Bundle**, an optional add-on to SDV Enterprise. For more information, please visit the [XSynthesizers Bundle](https://docs.sdv.dev/SDV/explore/sdv-bundles/xsynthesizers) page.
{% endhint %}

The SegmentSynthesizer calculates different segments of real data, and computes a different model for each one. You can supply any single-table synthesizer for computing the per-segment model. Use this when your real data is highly segmented, containing different patterns for each.

```python
from sdv.single_table import SegmentSynthesizer

synthesizer = SegmentSynthesizer(metadata)
synthesizer.fit(data)

synthetic_data = synthesizer.sample(num_rows=10)
```

## Creating a synthesizer

When creating your synthesizer, you are required to pass in a [Metadata](https://docs.sdv.dev/SDV/single-table-data/data-preparation/creating-metadata) object as the first argument. All other parameters are optional. You can include them to customize the synthesizer.

```python
synthesizer = SegmentSynthesizer(
    metadata, # required
    segmentation_params={
        'method': 'exact_values',
        'column_name': 'made_purchase'
    },
    per_segment_synthesizer='GaussianCopulaSynthesizer'
)
```

### Parameter Reference

**`enforce_min_max_values`**: Control whether the synthetic data should adhere to the same min/max boundaries set by the real data

<table data-header-hidden><thead><tr><th width="179"></th><th></th></tr></thead><tbody><tr><td>(default) <code>True</code></td><td>The synthetic data will contain numerical values that are within the ranges of the real data.</td></tr><tr><td><code>False</code></td><td>The synthetic data may contain numerical values that are less than or greater than the real data.</td></tr></tbody></table>

**`enforce_rounding`**: Control whether the synthetic data should have the same number of decimal digits as the real data

<table data-header-hidden><thead><tr><th width="179"></th><th></th></tr></thead><tbody><tr><td>(default) <code>True</code></td><td>The synthetic data will be rounded to the same number of decimal digits that were observed in the real data</td></tr><tr><td><code>False</code></td><td>The synthetic data may contain more decimal digits than were observed in the real data</td></tr></tbody></table>

**`locales`**: A list of locale strings. Any PII columns will correspond to the locales that you provide.

<table data-header-hidden><thead><tr><th width="218"></th><th></th></tr></thead><tbody><tr><td>(default) <code>['en_US']</code></td><td>Generate PII values in English corresponding to US-based concepts (eg. addresses, phone numbers, etc.)</td></tr><tr><td><code>&#x3C;list></code></td><td><p>Create data from the list of locales. Each locale string consists of a 2-character code for the language and 2-character code for the country, separated by an underscore.</p><p></p><p>For example <code>[</code><a href="https://faker.readthedocs.io/en/master/locales/en_US.html"><code>"en_US"</code></a><code>,</code> <a href="https://faker.readthedocs.io/en/master/locales/fr_CA.html"><code>"fr_CA"</code></a><code>]</code>. </p><p>For all options, see the <a href="https://faker.readthedocs.io/en/master/locales.html">Faker docs</a>.</p></td></tr></tbody></table>

**`segmentation_params`**: A dictionary of parameters that govern how to perform the segmentation. This allows for one of two possible methods.

<table data-header-hidden><thead><tr><th width="226"></th><th></th></tr></thead><tbody><tr><td>(default) <code>'algorithmic'</code> segmentation</td><td><p>Allow the synthesizer to algorithmically compute segments based on the data. You can optionally provide:</p><ul><li><code>n_segments</code>: The number of segments (defaults to 3)</li><li><code>column_names</code>: A list of column names to use for the algorithmic segmentation (defaults to all columns)</li></ul><pre class="language-python"><code class="lang-python">segmentation_params={
    'method': 'algorithmic', # required
    'n_segments': 5, # defaults to 3
    'column_names': ['age', 'income'] # defaults to all
}
</code></pre></td></tr><tr><td><code>'exact_values'</code> segmentation</td><td><p>Supply a categorical column that already contains the segments. The exact values from that column are used to identify the segments. </p><pre class="language-python"><code class="lang-python">segmentation_params={
    'method': 'exact_values', # required
    'column_name': 'made_purchase' # required
}
</code></pre></td></tr></tbody></table>

**`per_segment_synthesizer`**: A string with the type of synthesizer to use for modeling each individual segment. *You can update individual segment synthesizers later using the `set_synthesizer_for_segment` method, detailed below.*

<table data-header-hidden><thead><tr><th width="325"></th><th></th></tr></thead><tbody><tr><td>(default) <code>'GaussianCouplaSynthesizer'</code></td><td>Use the GaussianCopulaSynthesizer to model each segment.</td></tr><tr><td><code>&#x3C;synthesizer_name></code></td><td>Supply a synthesizer name from the list of <a href="">single table synthesizers</a>. For example <code>'XGCSynthesizer'</code> or <code>'CTGANSynthesizer'</code>.</td></tr></tbody></table>

**`per_segment_synthesizer_params`**: A dictionary of parameters to use for each of the per segment synthesizers.

<table data-header-hidden><thead><tr><th width="267"></th><th></th></tr></thead><tbody><tr><td>(default) <code>None</code></td><td>Use the default parameters for the synthesizer</td></tr><tr><td><code>&#x3C;dictionary></code></td><td>Update the default parameters for the synthesizer you've chosen by providing a dictionary of key/values pairs for each parameter. This is different for each synthesizer. Refer to the <a href="">synthesizer's API</a>.<br><br>For example, for <a href="#gaussiancopulasynthesizer.load">GaussianCopulaSynthesizer</a> you can supply: <code>{'default_distribution': 'norm'}</code>.</td></tr></tbody></table>

### set\_synthesizer\_for\_segment

Use this function to set the algorithm to use for a specific segment of the data. This is most useful if you are using the `'exact_values'` segmentation, as you already know the segments that the synthesizer will use.

**Parameters**

* (required) `segment_name`: The exact categorical value that corresponds to the segment. This should be a value that appears in the column used for segmentation.
* (required) `synthesizer_name`: A string with the type of synthesizer to use for modeling each individual segment. For example `'GaussianCopulaSynthesizer'` or `'CTGANSynthesizer'`.
* `synthesizer_params`: A dictionary of parameters to use for the synthesizer. This is different for each synthesizer. Refer to the [synthesizer's API](https://docs.sdv.dev/SDV/single-table-data/modeling/synthesizers).

**Output**: None. The synthesizer corresponding to the segment is set.

```python
synthesizer.set_synthesizer_for_segment(
    segment_name=True, # everything labeled as True is one segment
    synthesizer_name='CTGANSynthesizer' # the name of any SDV single-table synthesizer
    synthesizer_params={
        'epochs': 100
    }
)
```

### get\_parameters

Use this function to access the all parameters your synthesizer uses -- those you have provided as well as the default ones.

**Parameters** None

**Output** A dictionary with the parameter names and the values

```python
synthesizer.get_parameters()
```

```python
{
    'n_segements': 5,
    'per_segment_synthesizer': 'GaussianCopulaSynthesizer',
    ...
}
```

{% hint style="info" %}
The returned parameters are a copy. Changing them will not affect the synthesizer.
{% endhint %}

### get\_metadata

Use this function to access the metadata object that you have included for the synthesizer

**Parameters** None

**Output** A [Metadata](https://docs.sdv.dev/SDV/concepts/metadata) object

```python
metadata = synthesizer.get_metadata()
```

{% hint style="info" %}
The returned metadata is a copy. Changing it will not affect the synthesizer.
{% endhint %}

## Learning from your data

To learn a machine learning model based on your real data, use the `fit` method.

### fit

**Parameters**

* (required) `data`: A [pandas DataFrame](https://pandas.pydata.org/docs/reference/api/pandas.DataFrame.html) object containing the real data that the machine learning model will learn from

**Output** (None)

```python
synthesizer.fit(data)
```

{% hint style="info" %}
**Technical Details:** This synthesizer uses an algorithm to segment your real data into different groups. Each group may have different patterns. This synthesizer models each segment separately by calling upon other single-table synthesizers.

Since each segment is ultimately modeled separately, the overall fit time is expected to increase linearly with the number of segments.
{% endhint %}

## Saving your synthesizer

Save your trained synthesizer for future use.

### save

Use this function to save your trained synthesizer as a Python pickle file.

**Parameters**

* (required) `filepath`: A string describing the filepath where you want to save your synthesizer. Make sure this ends in `.pkl`&#x20;

**Output** (None) The file will be saved at the desired location

```python
synthesizer.save(
    filepath='my_synthesizer.pkl'
)
```

### load (utility function)

Use this utility function to load a trained synthesizer from a Python pickle file. After loading your synthesizer, you'll be able to sample synthetic data from it.

**Parameters**

* (required) `filepath`: A string describing the filepath of your saved synthesizer

**Output** Your synthesizer object

```python
from sdv.utils import load_synthesizer

synthesizer = load_synthesizer(
    filepath='my_synthesizer.pkl'
)
```

*This utility function works for any SDV synthesizer.*

## What's next?

After training your synthesizer, you can now sample synthetic data. See the [Sampling](https://docs.sdv.dev/SDV/single-table-data/sampling) section for more details.

```python
synthetic_data = synthesizer.sample(num_rows=10)
```

{% hint style="info" %}
**Want to improve your synthesizer?** Input logical rules in the form of constraints, and customize the transformations used for pre- and post-processing the data.

For more details, see [Customizations](https://docs.sdv.dev/SDV/single-table-data/modeling/customizations).
{% endhint %}

## FAQs

<details>

<summary>What happens if columns don't contain numerical data?</summary>

This synthesizer models non-numerical columns, including columns with missing values.

Most algorithms that you can use for the per-segment modeling are designed for numerical data. This synthesizer ensures that all segments are appropriately converted to numerical data before modeling using Reversible Data Transformers (RDTs).&#x20;

*Currently, it is not posisble to access and modify these transformations. Though this feature is coming soon!*

</details>

<details>

<summary>Can I call <code>fit</code> again even if I've previously fit some data?</summary>

Yes, even if you're previously fit data, you should be able to call the `fit` method again.

If you do this, the synthesizer will **start over from scratch** and fit the new data that you provide it. This is the equivalent of creating a new synthesizer and fitting it with new data.

</details>


---

# Agent Instructions: Querying This Documentation

If you need additional information that is not directly available in this page, you can query the documentation dynamically by asking a question.

Perform an HTTP GET request on the current page URL with the `ask` query parameter:

```
GET https://docs.sdv.dev/SDV/single-table-data/modeling/synthesizers/segmentsynthesizer.md?ask=<question>
```

The question should be specific, self-contained, and written in natural language.
The response will contain a direct answer to the question and relevant excerpts and sources from the documentation.

Use this mechanism when the answer is not explicitly present in the current page, you need clarification or additional context, or you want to retrieve related documentation sections.
