# UniformEncoder

**Compatibility:** `categorical` or `boolean` data

The `UniformEncoder` transforms data that represents categorical values into a uniform distribution in the `[0,1]` interval. It is highly accurate at preserving the overall frequencies of each category.

<figure><img src="https://2225246359-files.gitbook.io/~/files/v0/b/gitbook-x-prod.appspot.com/o/spaces%2FVGX92M819eIp0rMg5elc%2Fuploads%2FTBFs3heAY2fY0Q5Yo6v5%2Frdt_transformers-glossary-categorical-uniform-encoder_June%2002%202025.png?alt=media&#x26;token=1fc2c7f3-3445-438e-a29c-9bc7e3af0ab6" alt=""><figcaption></figcaption></figure>

```python
from rdt.transformers.categorical import UniformEncoder

transformer = UniformEncoder()
```

## Parameters

**`order_by`**: Apply a prescribed ordering scheme. Use this if the discrete categorical values have an order.

<table data-header-hidden><thead><tr><th width="232.5"></th><th></th></tr></thead><tbody><tr><td>(default) <code>None</code></td><td>Do not apply a particular order</td></tr><tr><td><code>'numerical_value'</code></td><td>If the data is represented by integers or floats, order by those values</td></tr><tr><td><code>'alphabetical'</code></td><td>If the data is represented by strings, order them alphabetically.</td></tr></tbody></table>

### Examples

```python
from rdt.transformers.categorical import UniformEncoder

transformer = UniformEncoder(
    order_by='alphabetical'
)
```

The transformer assigns each category to a unique, non-overlapping subset of the `[0,1]` interval. The length of the interval is based on the category's frequency. For example if category `'CASH'` occurs with 60% frequency, the subset will have the length `0.6` such as `[0.2, 0.8]`.

## Attributes

After fitting the transformer, you can access the learned values through the attributes.

**`frequencies`**: A dictionary that maps each category value to the observed frequency, as a float between 0 and 1

```python
>>> transformer.frequencies
{
  'CREDIT': 0.2, 
  'CASH': 0.6,
  'DEBIT': 0.2
}
```

**`intervals`**: A dictionary that maps each category value to an interval between `[0,1]`. This allows you to determine the exact rules used for transforming and reverse transforming.

```python
>>> transformer.intervals
{
  'CREDIT': [0, 0.2],
  'CASH': [0.2, 0.8],
  'DEBIT': [0.8, 1.0]
}
```

## FAQs

<details>

<summary>When should I use this transformer?</summary>

The UniformEncoder is shown to preserve the frequency of each category value with high accuracy. This is especially useful if you have a data imbalance, for example if `True` occurs only 1% of the time while `False` occurs 99% of the time.&#x20;

</details>

<details>

<summary>When should I use the <code>order_by</code> parameter?</summary>

Use this parameter when the categorical data is ordinal (has a specific order) and the order can easily be discovered through sorting. For example, you might storing survey responses as `'response_00'`, `'response_01'`, `'response_02'`, etc.

Don't add this parameter if it isn't necessary. Ordering increases the time it takes for transformation.

</details>

<details>

<summary>What if I'd like to sort the values by a custom order?</summary>

In some cases, your categories may not have an alphanumeric ordering scheme. Use the [**OrderedUniformEncoder**](https://docs.sdv.dev/rdt/transformers-glossary/categorical/ordereduniformencoder) to add your own, custom sorting order.

</details>

<details>

<summary>What happens to missing values?</summary>

This transformer treats missing values as if they are a new category of data.

</details>
