# RegexGenerator

**Compatibility:** `id` data

The `RegexGenerator` is used to create structured text. When transforming the data, it simply removes the column. When reversing the transform, it recreates the structured text in the column through a regex string.

![](https://2225246359-files.gitbook.io/~/files/v0/b/gitbook-x-prod.appspot.com/o/spaces%2FVGX92M819eIp0rMg5elc%2Fuploads%2FfAgJbsHvoTGJK8qCB5hQ%2Frdt_transformers-glossary-ID-regex-generator_June%2002%202025.png?alt=media\&token=c288b268-1017-402f-b256-bee3d503cc38)

```python
from rdt.transformers.text import RegexGenerator

transformer = RegexGenerator()
```

You can specify the exact regex string to use for more realistic data.

## Parameters

**`regex_format`**: A string that represents a [Regular Expression↗](https://en.wikipedia.org/wiki/Regular_expression). This expression will be used to generate new data.

<table data-header-hidden><thead><tr><th width="270.5"></th><th></th></tr></thead><tbody><tr><td>(default) <code>'[A-Za-z]{5}'</code></td><td>Generate 5-character strings such as <code>'ABCDE'</code>.</td></tr><tr><td><code>&#x3C;string></code></td><td>Use the specified regex string to generate new values.</td></tr></tbody></table>

**`cardinality_rule`**: How many unique values to create in the fake data

<table data-header-hidden><thead><tr><th width="270.59375"></th><th></th></tr></thead><tbody><tr><td>(default) <code>None</code> </td><td>Do not impose any rules. Any number of Regex values can be generated.</td></tr><tr><td><code>'unique'</code></td><td>The generated data should not contain any repeating values.<br><em>Note: This option may limit the amount of data that you can create using the Regex</em></td></tr><tr><td><code>'match'</code></td><td>Learn the number of unique values from the fit data and ensure that the generated data contains the same number. These may be repeated.</td></tr><tr><td><code>'scale'</code></td><td>Learn the number of unique values from the fit data and scale it proportionally when generating data. For example, if there are 25 unique values for every 100 rows of data, the transformer will create 50 unique values when generating 200 rows.</td></tr></tbody></table>

*(deprecated) `enforce_uniqueness`: Use the `cardinality_rule` parameter instead.*

**`generation_order`**: Which order to use when generating the regexes (during the reverse transform)

<table data-header-hidden><thead><tr><th width="260"></th><th></th></tr></thead><tbody><tr><td>(default) <code>'alphanumeric'</code></td><td>Generate the data sequentially, or in alphanumeric order. For eg. <code>'aaa'</code>, <code>'aab'</code>, <code>'aac'</code>, etc.</td></tr><tr><td><code>'scrambled'</code></td><td>Generate the data sequentially but then scramble it before returning the results. For large batches of data, this is an effective way to achieve the notion of randomness.</td></tr><tr><td>＊ <code>'random'</code></td><td>Generate data completely randomly. This method works even for small batches of data.</td></tr></tbody></table>

{% hint style="info" %}
**＊SDV Enterprise Feature.** This feature is available to our licensed users and is not currently in our public library. For more information, visit our page to [**Explore SDV**](https://docs.sdv.dev/sdv/reference/explore-sdv).
{% endhint %}

### Examples

```python
from transformers.text import RegexGenerator

# generate values that follow the format 'ID_' followed by a 3-digit number
rg = RegexGenerator(
    regex_format='ID_\d{3}',
    enforce_uniqueness=True
)
```

![](https://2225246359-files.gitbook.io/~/files/v0/b/gitbook-x-prod.appspot.com/o/spaces%2FVGX92M819eIp0rMg5elc%2Fuploads%2FcheeHRxAHMDwlMB6MTVk%2Frdt_transformers-glossary-ID-regex-generator-examples_June%2002%202025.png?alt=media\&token=91096974-4d21-4b41-9453-a6ceac8be731)

## FAQs

<details>

<summary>Are all regexes supported?</summary>

The RegexGenerator does not currently support regexes with sub-patterns, which are frequently used to indicate an "or" logic. For example, the regex string `'([A-Z]{2}|\d{4})'`  is intended to match a 2-character string such as `'DB'` *or* a 4-digit string such as `'0391'`. This regex is not suitable for the RegexGenerator.

**Tip:** If you are trying to express a basic index column with countable integers (`0`, `1`, `2`, ...), we recommend using the [IndexGenerator](https://docs.sdv.dev/rdt/transformers-glossary/id/indexgenerator) instead of this transformer. The IndexGenerator also allows you to input a prefix and suffix to the index.

</details>

<details>

<summary>When should I use this transformer?</summary>

The `RegexGenerator` is useful for id columns that do not have any mathematical meaning. This transformers follows the regex format to generate values, which may be exactly the same as the real data depending on the exact format string.

This transformer is useful for columns that represent structured IDs, such as a primary key column.

</details>


---

# Agent Instructions: Querying This Documentation

If you need additional information that is not directly available in this page, you can query the documentation dynamically by asking a question.

Perform an HTTP GET request on the current page URL with the `ask` query parameter:

```
GET https://docs.sdv.dev/rdt/transformers-glossary/id/regexgenerator.md?ask=<question>
```

The question should be specific, self-contained, and written in natural language.
The response will contain a direct answer to the question and relevant excerpts and sources from the documentation.

Use this mechanism when the answer is not explicitly present in the current page, you need clarification or additional context, or you want to retrieve related documentation sections.
