# LabelEncoder

**Compatibility:** `categorical` data (nominal and ordinal)

The `LabelEncoder` transforms data that represents categorical values into integers `0`, `1`, `2`, etc. corresponding to each category.&#x20;

![](https://2225246359-files.gitbook.io/~/files/v0/b/gitbook-x-prod.appspot.com/o/spaces%2FVGX92M819eIp0rMg5elc%2Fuploads%2FfY3R09jzrnVVgZJoaxX5%2Frdt_transformers-glossary-categorical-labelencoder_June%2002%202025.png?alt=media\&token=0bc72c0b-d8e7-468f-b6ce-58576280b0c2)

```python
from rdt.transformers.categorical import LabelEncoder
le = LabelEncoder()
```

## Parameters

**`order_by`**: Apply a prescribed ordering scheme to the values before assigning the labels

<table data-header-hidden><thead><tr><th width="232.5"></th><th></th></tr></thead><tbody><tr><td>(default) <code>None</code></td><td>Do not apply a particular order. The first unique value will be assigned label <code>0</code>, the second unique value will be assigned label <code>1</code>, etc.</td></tr><tr><td><code>'numerical_value'</code></td><td>If the data is represented by integers or floats, order by those values before assigning the labels. That is: label <code>0</code> will be assigned to the smallest value, label <code>1</code> will be assigned to the second smallest, etc.</td></tr><tr><td><code>'alphabetical'</code></td><td>If the data is represented by strings, order them alphabetically before assigning the labels. That is: label <code>0</code> will be assigned to the first alphabetical string, label <code>1</code> to the second, etc. Note: Digits will also be alphabetized in order from <code>'0'</code> to <code>'9'</code>.</td></tr></tbody></table>

**`add_noise`**: Add noise to the label values

<table data-header-hidden><thead><tr><th width="228.5"></th><th></th></tr></thead><tbody><tr><td>(default) <code>False</code></td><td>Do not not add noise. Each time a category appears, it will always be transformed to the same label value.</td></tr><tr><td><code>True</code></td><td>Add noise. A category will be transformed to the same label with some noise added. For example instead of the label <code>1</code>, values might be noised to <code>1.001</code>, <code>1.456</code>, <code>1.999</code>, etc.</td></tr></tbody></table>

### Examples

```python
from transformers.categorical import LabelEncoder

# order the values alphabetically before assigning the labels
# and then add noise to the labels
le = LabelEncoder(order_by='alphabetical', add_noise=True)
```

## FAQs

<details>

<summary>When should I use the <code>order_by</code> parameter?</summary>

Use this parameter when the categorical data is ordinal (has a specific order) and the order can easily be discovered through sorting. For example, you might storing survey responses as `'response_00'`, `'response_01'`, `'response_02'`, etc.

Don't add this parameter if it isn't necessary. Ordering increases the time it takes for transformation.

</details>

<details>

<summary>What if I'd like to sort the values by a custom order?</summary>

In some cases, your categories may not have an alphanumeric ordering scheme. Use the [OrderedLabelEncoder](https://docs.sdv.dev/rdt/transformers-glossary/categorical/orderedlabelencoder) to add your own, custom sorting order.

</details>

<details>

<summary>What happens to missing values?</summary>

This transformer treats missing values as if they are a new category of data. If you are using the `order_by` parameter, the missing values will always be assigned the highest label value.

</details>

<details>

<summary>When should I add noise?</summary>

If you do not add noise, the transformer will convert each category to a distinct label. For example `AMEX` is always converted to the label `1`. If you add noise, the transformer will generate some random variation so the numbers are not distinct. For example `AMEX` may sometimes be `1.001` and other times be `1.999`-- but always in the interval`[1, 2)`.&#x20;

Adding noise creates a continuous distribution. Your decision to add noise is dependent on your use of the data. If you are using the data for machine learning (ML), consider whether the techniques you plan to use work better on continuous distributions.

</details>
