# ＊ DomainBasedAnonymizer

{% hint style="info" %}
**＊SDV Enterprise Feature.** This feature is available to our licensed users and is not currently in our public library. For more information, visit our page to [Explore SDV](https://docs.sdv.dev/sdv/explore/sdv-enterprise/compare-features).
{% endhint %}

The `DomainBasedAnonymizer` performs Contextual Anonymization on email data. It transforms emails by extracting their domains. When reversing the transform, it generates new, fake emails with the correct domains.

<figure><img src="/files/ROvwJqERdBwaNnD9jbLr" alt=""><figcaption></figcaption></figure>

```python
from rdt.transformers.email import DomainBasedAnonymizer

transformer = DomainBasedAnonymizer(obfuscate_emails=True)
```

## Parameters

**`extracted_domain`**: Which parts of the overall email domain to extract during the transformation phase

<table data-header-hidden><thead><tr><th width="186"></th><th></th></tr></thead><tbody><tr><td>(default) <code>'full'</code></td><td>Extract the full domain, which is everything after the <strong>@</strong> sign. For example if the email is <code>'info@datacebo.com'</code>, the full domain is <code>'datacebo.com'</code>.</td></tr><tr><td><code>'top'</code></td><td>Extract only the top domain, which is everything after the <strong>.</strong> character. For example if the email is <code>'info@datacebo.com'</code>, the top domain is <code>'com'</code>.</td></tr></tbody></table>

**`enforce_unique_count`**: Limit the number of new emails created to the number originally found in the dataset.

<table data-header-hidden><thead><tr><th width="185"></th><th></th></tr></thead><tbody><tr><td>(default) <code>False</code></td><td>Create a variety of new emails based on the domain</td></tr><tr><td><code>True</code></td><td>Put a limit on the amount of new emails created. Emails will be recycled after the limit is reached.</td></tr></tbody></table>

{% hint style="warning" %}
Setting this to `True` will leak information about the number of unique emails within each domain. However, these emails will be newly createdones that may not appear in the original data. Always evaluate the risk of a data leak before sharing your transformed data.
{% endhint %}

**`obfuscate_emails`**: Control whether the overall email looks realistic or follows random patterns.

<table data-header-hidden><thead><tr><th width="185"></th><th></th></tr></thead><tbody><tr><td>(default) <code>False</code></td><td>Create realistic-looking usernames and emails such as <code>'johndoe@gmail.com'</code>.</td></tr><tr><td><code>True</code></td><td>Obfuscate the usernames and emails to create random values such as <code>'dkep22ocp2@sdv-example.com'</code>.</td></tr></tbody></table>

{% hint style="warning" %}
Setting this to `False` may result in emails that correspond to real user emails by complete coincidence. If you are worried about creating emails that accidentally correspond to real users, please set this to `True`.
{% endhint %}

### Examples

```python
from rdt.transformers.email import DomainBasedAnonymizer

transformer = DomainBasedAnonymizer(
    extracted_domain='top',
    enforce_unique_count=False,
    obfuscate_emails=True
)
```

## Attributes

After fitting the transformer, you can access the learned values through the attributes.

**`domain_to_unique_count`**: The number of unique email addresses that belong to every domain of the original data.

```python
>>> transformer.domain_to_unique_count
{
    'datacebo.com': 15,
    'gmail.com': 103,
    'yahoo.com': 10,
    'sdv.dev': 14
}
```

*Note: If you have not selected to enforce unique emails per domain, then the transformer will not compute these values. If you have, then you'll see the count per domain, top or full domain as you specified.*


---

# Agent Instructions: Querying This Documentation

If you need additional information that is not directly available in this page, you can query the documentation dynamically by asking a question.

Perform an HTTP GET request on the current page URL with the `ask` query parameter:

```
GET https://docs.sdv.dev/rdt/transformers-glossary/deep-data-understanding/email/domainbasedanonymizer.md?ask=<question>
```

The question should be specific, self-contained, and written in natural language.
The response will contain a direct answer to the question and relevant excerpts and sources from the documentation.

Use this mechanism when the answer is not explicitly present in the current page, you need clarification or additional context, or you want to retrieve related documentation sections.
