# UniformEncoder

**Compatibility:** `categorical` or `boolean` data

The `UniformEncoder` transforms data that represents categorical values into a uniform distribution in the `[0,1]` interval. It is highly accurate at preserving the overall frequencies of each category.

<figure><img src="/files/Ral8RcNl7iaUpjEkWItA" alt=""><figcaption></figcaption></figure>

```python
from rdt.transformers.categorical import UniformEncoder

transformer = UniformEncoder()
```

## Parameters

**`order_by`**: Apply a prescribed ordering scheme. Use this if the discrete categorical values have an order.

<table data-header-hidden><thead><tr><th width="232.5"></th><th></th></tr></thead><tbody><tr><td>(default) <code>None</code></td><td>Do not apply a particular order</td></tr><tr><td><code>'numerical_value'</code></td><td>If the data is represented by integers or floats, order by those values</td></tr><tr><td><code>'alphabetical'</code></td><td>If the data is represented by strings, order them alphabetically.</td></tr></tbody></table>

### Examples

```python
from rdt.transformers.categorical import UniformEncoder

transformer = UniformEncoder(
    order_by='alphabetical'
)
```

The transformer assigns each category to a unique, non-overlapping subset of the `[0,1]` interval. The length of the interval is based on the category's frequency. For example if category `'CASH'` occurs with 60% frequency, the subset will have the length `0.6` such as `[0.2, 0.8]`.

## Attributes

After fitting the transformer, you can access the learned values through the attributes.

**`frequencies`**: A dictionary that maps each category value to the observed frequency, as a float between 0 and 1

```python
>>> transformer.frequencies
{
  'CREDIT': 0.2, 
  'CASH': 0.6,
  'DEBIT': 0.2
}
```

**`intervals`**: A dictionary that maps each category value to an interval between `[0,1]`. This allows you to determine the exact rules used for transforming and reverse transforming.

```python
>>> transformer.intervals
{
  'CREDIT': [0, 0.2],
  'CASH': [0.2, 0.8],
  'DEBIT': [0.8, 1.0]
}
```

## FAQs

<details>

<summary>When should I use this transformer?</summary>

The UniformEncoder is shown to preserve the frequency of each category value with high accuracy. This is especially useful if you have a data imbalance, for example if `True` occurs only 1% of the time while `False` occurs 99% of the time.&#x20;

</details>

<details>

<summary>When should I use the <code>order_by</code> parameter?</summary>

Use this parameter when the categorical data is ordinal (has a specific order) and the order can easily be discovered through sorting. For example, you might storing survey responses as `'response_00'`, `'response_01'`, `'response_02'`, etc.

Don't add this parameter if it isn't necessary. Ordering increases the time it takes for transformation.

</details>

<details>

<summary>What if I'd like to sort the values by a custom order?</summary>

In some cases, your categories may not have an alphanumeric ordering scheme. Use the [**OrderedUniformEncoder**](/rdt/transformers-glossary/categorical/ordereduniformencoder.md) to add your own, custom sorting order.

</details>

<details>

<summary>What happens to missing values?</summary>

This transformer treats missing values as if they are a new category of data.

</details>


---

# Agent Instructions: Querying This Documentation

If you need additional information that is not directly available in this page, you can query the documentation dynamically by asking a question.

Perform an HTTP GET request on the current page URL with the `ask` query parameter:

```
GET https://docs.sdv.dev/rdt/transformers-glossary/categorical/uniformencoder.md?ask=<question>
```

The question should be specific, self-contained, and written in natural language.
The response will contain a direct answer to the question and relevant excerpts and sources from the documentation.

Use this mechanism when the answer is not explicitly present in the current page, you need clarification or additional context, or you want to retrieve related documentation sections.
