# ❖ ECDFNormalizer

**Compatibility:** `numerical` data

{% hint style="info" %}
❖ **SDV Enterprise Bundle**. This feature is available as part of the **XSynthesizers Bundle**, an optional add-on to SDV Enterprise. For more information, please visit the [XSynthesizers Bundle](https://docs.sdv.dev/sdv/reference/explore-sdv/xsynthesizers-bundle) page.
{% endhint %}

The `ECDFNormalizer` normalizes your data into a uniform or normal shape. To do this, estimates the [empirical distribution](https://en.wikipedia.org/wiki/Empirical_distribution_function).  (On the reverse transform, this transformers brings the data back into its original shape.)

<figure><img src="/files/DEPYLChuBNzDXVNSBhDp" alt=""><figcaption></figcaption></figure>

```python
from rdt.transformers.numerical import ECDFNormalizer

transformer = ECDFNormalizer(
    known_min_value=0
    known_max_value=100,
    normalized_distribution='uniform'
)
```

## Parameters

**`known_min_value`**: A previously-known min value that the data must take. This determines the minimum possible value the transformer can accept.

<table data-header-hidden><thead><tr><th width="210.69140625"></th><th></th></tr></thead><tbody><tr><td>(default) <code>None</code></td><td>There is no known minimum value for the data. The transformer will compute one based on the fit data.</td></tr><tr><td><code>&#x3C;float></code></td><td>The transformer will make sure the data will never be less than the value</td></tr></tbody></table>

**`known_max_value`**: A previously-known max value that the data must take. This determines the maximum possibile value the transformer can accept.

<table data-header-hidden><thead><tr><th width="210.69140625"></th><th></th></tr></thead><tbody><tr><td>(default) <code>None</code></td><td>There is no known maximum value for the data. The transformer will compute one based on the fit data.</td></tr><tr><td><code>&#x3C;float></code></td><td>The transformer will make sure the data will never be greater than the value</td></tr></tbody></table>

**`n_bins`** : This parameter controls the number of bins to divide the data into when computing the empirical distribution. You can think of these as the # of bars a histogram of the data would have.

<table data-header-hidden><thead><tr><th width="203"></th><th></th></tr></thead><tbody><tr><td>(default) <code>25</code></td><td>Divide up the data into 25 bins when computing the empirical distribution</td></tr><tr><td><code>&#x3C;int></code></td><td>Divide up the data into the provided number of bins</td></tr></tbody></table>

**`missing_range_encoding`**: How to encode missing ranges (aka histogram bins that have a frequency of 0)

<table data-header-hidden><thead><tr><th width="203"></th><th></th></tr></thead><tbody><tr><td>(default) <code>'exclude'</code></td><td>Bins with a frequency of 0 should not be included in the CDF function. This means that reverse transformed data will never be inside these ranges.</td></tr><tr><td><code>'low_probability'</code></td><td>Bins with a frequency of 0 should be included in the CDF function and assign a low probability. This means that the reverse transformed data can be inside missing ranges.</td></tr></tbody></table>

**`missing_value_encoding`**: Add this argument to control how to encode missing values in the empirical distribution. Missing values can be binned together and represented as being in either the highest or lowest bin of the histogram.

<table data-header-hidden><thead><tr><th width="203"></th><th></th></tr></thead><tbody><tr><td>(default) <code>'ecdf_low_bin'</code></td><td>Encode the missing values in the empirical CDF as the first (lowest) bin</td></tr><tr><td><code>'ecdf_high_bin'</code></td><td>Encoding the missing values in the empirical CDF as the final (highest) bin</td></tr></tbody></table>

**`normalized_distribution`**: Add this argument to control the shape of the transformed data. Choose whatever is easiest for your downstream use case.

<table data-header-hidden><thead><tr><th width="203"></th><th></th></tr></thead><tbody><tr><td>(default) <code>'uniform'</code></td><td>Transform the data into a uniform distribution, between 0 and 1.</td></tr><tr><td><code>'norm'</code></td><td>Transform the data into a standard normal distribution, aka a bell curve with mean of 0 and standard deviation of 1.</td></tr></tbody></table>

**`learn_rounding_scheme`**: Add this argument to allow the transformer to learn about rounded values in your dataset.

<table data-header-hidden><thead><tr><th width="204.57421875"></th><th></th></tr></thead><tbody><tr><td>(default) <code>False</code></td><td>Do not learn or enforce any rounding scheme. When reverse transforming the data, there may be many decimal places present.</td></tr><tr><td><code>True</code></td><td>Learn the rounding rules from the input data. When reverse transforming the data, round the number of digits to match the original.</td></tr></tbody></table>

## FAQ

<details>

<summary>Which algorithms does this transformer use?</summary>

This transformer creates a histogram of your data and uses it compute an [empirical CDF distribution](https://en.wikipedia.org/wiki/Empirical_distribution_function). The empirical CDF distribution can be used to normalize your data into a different shape (uniform or normal) using the [probability integral transform](https://en.wikipedia.org/wiki/Probability_integral_transform).

</details>


---

# Agent Instructions: Querying This Documentation

If you need additional information that is not directly available in this page, you can query the documentation dynamically by asking a question.

Perform an HTTP GET request on the current page URL with the `ask` query parameter:

```
GET https://docs.sdv.dev/rdt/transformers-glossary/numerical/ecdfnormalizer.md?ask=<question>
```

The question should be specific, self-contained, and written in natural language.
The response will contain a direct answer to the question and relevant excerpts and sources from the documentation.

Use this mechanism when the answer is not explicitly present in the current page, you need clarification or additional context, or you want to retrieve related documentation sections.
