Search…
⌃K
Links

RegexGenerator

Compatibility: text data
The RegexGenerator is used to create structured text. When transforming the data, it simply removes the column. When reversing the transform, it recreates the structured text in the column through a regex string.
from rdt.transformers.text import RegexGenerator
rg = RegexGenerator()
You can specify the exact regex string to use for more realistic data.

Parameters

regex_format: A string that represents a Regular Expression↗. This expression will be used to generate new data.
(default) '[A-Za-z]{5}'
Generate 5-character strings such as 'ABCDE'.
<string>
Use the specified regex string to generate new values.
enforce_uniqueness: Whether to guarantee that the created fake data will be unique
(default) False
The generated data may contain repeating values
True
The generated data will not contain any repeating values

Examples

from transformers.text import RegexGenerator
# generate values that follow the format 'ID_' followed by a 3-digit number
rg = RegexGenerator(
regex_format='ID_\d{3}',
enforce_uniqueness=True
)

FAQs

The RegexGenerator is useful for text columns that do not have any mathematical meaning or privacy implications. This transformers follows the regex format to generate values, which may be exactly the same as the real data depending on the exact format string.
Generally, we expect you to use this transformer for columns that represent structured, surrogate IDs, such as a primary key column.