# AnonymizedFaker

**Compatibility:** `id` and `pii` data

The `AnonymizedFaker` creates anonymized text belonging to specific contexts or rulesets. When transforming the data, it simply removes the column. When reversing the transform, it anonymizes the column by creating completely new, fake data at random using the [Python Faker library](https://faker.readthedocs.io/en/master/providers.html).&#x20;

![](https://2225246359-files.gitbook.io/~/files/v0/b/gitbook-x-prod.appspot.com/o/spaces%2FVGX92M819eIp0rMg5elc%2Fuploads%2FFTPxcJ4BxCF4pGd1DRIl%2Frdt_transformers-glossary-ID-anonymized-faker_June%2002%202025.png?alt=media\&token=840ee47d-5319-400a-a91b-3e355fe93eba)

```python
from rdt.transformers.pii import AnonymizedFaker

transformer = AnonymizedFaker()
```

You can specify the exact faker method to use for more realistic data.

## Parameters

**`provider_name`**: The name of the provider to use from the Faker library.

<table data-header-hidden><thead><tr><th width="215.5"></th><th></th></tr></thead><tbody><tr><td>(default) <code>None</code></td><td>Use the <a href="https://faker.readthedocs.io/en/master/providers/baseprovider.html">BaseProvider</a> from Faker, which capable of creating random text.</td></tr><tr><td><code>&#x3C;string></code></td><td>Use the provider for a specific context, for example <a href="https://faker.readthedocs.io/en/master/providers/faker.providers.address.html"><code>"address"</code></a> or <a href="https://faker.readthedocs.io/en/master/providers/faker.providers.address.html"><code>"bank"</code></a>.</td></tr></tbody></table>

**`function_name`**: The name of the function to use within the Faker provider.

<table data-header-hidden><thead><tr><th width="213.5"></th><th></th></tr></thead><tbody><tr><td>(default) <code>'lexify'</code></td><td>Use the <a href="https://faker.readthedocs.io/en/master/providers/baseprovider.html#faker.providers.BaseProvider.lexify">lexify method</a> to create random 4-character text.</td></tr><tr><td><code>&#x3C;string></code></td><td>Use the function from the specified provider to generate fake data. For example, <a href="https://faker.readthedocs.io/en/master/providers/faker.providers.address.html#faker.providers.address.Provider.street_address"><code>"street_address"</code></a> from the address provider or <a href="https://faker.readthedocs.io/en/master/providers/faker.providers.bank.html#faker.providers.bank.Provider.swift"><code>"swift"</code></a> from the bank provider.</td></tr></tbody></table>

{% hint style="info" %}
Together, the `provider_name` and `function_name` parameters specify exactly how to create fake data. Some common values are:

* A [full address](https://faker.readthedocs.io/en/master/providers/faker.providers.address.html#faker.providers.address.Provider.address): `provider_name="address", function_name="address"`
* A [basic bank account number](https://faker.readthedocs.io/en/master/providers/faker.providers.bank.html#faker.providers.bank.Provider.bban): `provider_name="bank", function_name="bban"`
* A [full credit card number](https://faker.readthedocs.io/en/master/providers/faker.providers.credit_card.html#faker.providers.credit_card.Provider.credit_card_number): `provider_name="credit_card", function_name="credit_card_number"`
* [Latitude/longitude coordinates](https://faker.readthedocs.io/en/master/providers/faker.providers.geo.html#faker.providers.geo.Provider.local_latlng): `provider_name="geo", function_name="local_latlng"`
* A [phone number](https://faker.readthedocs.io/en/master/providers/faker.providers.phone_number.html#faker.providers.phone_number.Provider.phone_number): `provider_name="phone_number", function_name="phone_number"`

To browse for more options, visit the [Faker library's docs](https://faker.readthedocs.io/en/master/providers.html).
{% endhint %}

**`function_kwargs`**: Optional parameters to pass into the function that you're specifying to create Fake data.

<table data-header-hidden><thead><tr><th width="206.86176817149163"></th><th></th></tr></thead><tbody><tr><td>(default) <code>None</code></td><td>Do not specify any additional parameters</td></tr><tr><td><code>&#x3C;dictionary></code></td><td>Additional parameters to add. These are unique to the function name and should be represented as a dictionary.<br><br>For example for the banking <a href="https://faker.readthedocs.io/en/master/providers/faker.providers.bank.html#faker.providers.bank.Provider.swift"><code>"swift"</code></a> function, you can specify: <code>{"length": 11, "primary": True}</code>.</td></tr></tbody></table>

**`locales`**: An optional list of locales to use when generating the Fake data.

<table data-header-hidden><thead><tr><th width="211.5"></th><th></th></tr></thead><tbody><tr><td>(default) <code>None</code></td><td>Use the default locale, which is usually set to the country you are in.</td></tr><tr><td><code>&#x3C;list></code></td><td>Create data from the list of locales. These are specified as strings representing the language and country from Faker. <br><br>For example <code>[</code><a href="https://faker.readthedocs.io/en/master/locales/en_US.html"><code>"en_US"</code></a><code>,</code> <a href="https://faker.readthedocs.io/en/master/locales/fr_CA.html"><code>"fr_CA"</code></a><code>]</code>.</td></tr></tbody></table>

{% hint style="warning" %}
Setting a locale might leak information about the original data. Anyone with access to the anonymized data will be able to tell which countries and locales are included in the original data .
{% endhint %}

**`cardinality_rule`**: Control the number of PII values that will be created

<table data-header-hidden><thead><tr><th width="199"></th><th></th></tr></thead><tbody><tr><td>(default) <code>None</code></td><td>Do not impose any rules. Any number of unique PII can be generated.</td></tr><tr><td><code>'unique'</code></td><td>The generated data should not contain any repeating values. <em>Note: This option may limit the amount of data that you can create.</em></td></tr><tr><td><code>'match'</code></td><td>Learn the number of unique values from the fit data and ensure that the generated data contains the same number. These may be repeated.</td></tr><tr><td><code>'scale'</code></td><td>Learn the number of unique values from the fit data and scale it proportionally when generating data. For example, if there are 25 unique values for every 100 rows of data, the transformer will create 50 unique values when generating 200 rows.</td></tr></tbody></table>

*(deprecated) `enforce_uniqueness`: Use `cardinality_rule` instead.*

**`missing_value_generation`**: Add this argument to determine how to recreate missing values during the reverse transform phase

<table data-header-hidden><thead><tr><th width="203"></th><th></th></tr></thead><tbody><tr><td>(default) <code>'random'</code></td><td>Randomly assign missing values in roughly the same proportion as the original data.</td></tr><tr><td><code>None</code></td><td>Do not recreate missing values.</td></tr></tbody></table>

### Examples

```python
from rdt.transformers.pii import AnonymizedFaker

# create more realistic-looking data by specifying a provider and function
transformer = AnonymizedFaker(
    provider_name="person",
    function_name="name",
    cardinality_rule='match'
)
```

![](https://2225246359-files.gitbook.io/~/files/v0/b/gitbook-x-prod.appspot.com/o/spaces%2FVGX92M819eIp0rMg5elc%2Fuploads%2FsYDxH8WF6u7g4NHGjVei%2Frdt_transformers-glossary-ID-anonymized-faker-examples_June%2002%202025.png?alt=media\&token=07481ad0-5c5d-48e3-b3dd-f3e286fb8b43)

## FAQs

<details>

<summary>When should I use this transformer?</summary>

Use the `AnonymizedFaker` whenever you have sensitive data that should not be part of your data science project. By default, the transformer reverses the transform into fake, 4-character strings such as `"UaNJ"` in place of the original, sensitive data.

You can also use this transformer for ID columns with rules that cannot easily be described via Regex. For example, IDs with 4-character strings in random order, such as `"UaNJ"`. Tip: Use the `cardinality_rule` parameter for primary keys.

</details>

<details>

<summary>Will any of the real values show up in the fake data?</summary>

The `AnonymizedFaker` generates fake data randomly without looking at the real values. So there is a small chance that a real value may show up in the real data by complete coincidence. For example, if your real data had a phone number `"(617)123-4567"`, there's a small probability that the exact same phone number will be created by random chance.

This behavior actually protects your sensitive data! Otherwise, anyone with access to the fake data would be able to deduce the real values by noting down what's missing.

</details>

<details>

<summary>What is the difference between the <code>AnonymizedFaker</code> and the <code>PseudoAnonymizedFaker</code>?</summary>

Pseudo-anonymization indicates that the anonymization scheme can be reversed while anonymization indicates that it's permanent.

This transformer anonymizes data in an irreversible way by creating fake data in a completely random fashion. It will not be possible to guess the real values based on the fake values. This behavior allows you to protect the sensitive values in your data.

If you want to anonymize your data in a reversible way, use the [PseudoAnonymizedFaker](https://docs.sdv.dev/rdt/transformers-glossary/generic-pii-anonymization/pseudoanonymizedfaker) instead.

</details>

<details>

<summary>Can I create and use my own custom Faker providers with <code>AnonymizedFaker</code>?</summary>

At this time, `AnonymizedFaker` doesn't explicitly support custom Faker functions that you've created yourself.  You can use any of the [standard providers](https://faker.readthedocs.io/en/latest/providers.html) in Faker.

</details>
