* DomainBasedAnonymizer
*SDV Enterprise Feature. This feature is available to our licensed users and is not currently in our public library. To learn more about the SDV Enterprise and its extra features, visit our website.
The DomainBasedAnonymizer
performs Contextual Anonymization on email data. It transforms emails by extracting their domains. When reversing the transform, it generates new, fake emails with the correct domains.
Parameters
extracted_domain
: Which parts of the overall email domain to extract during the transformation phase
(default) | Extract the full domain, which is everything after the @ sign. For example if the email is |
| Extract only the top domain, which is everything after the . character. For example if the email is |
enforce_unique_count
: Limit the number of new emails created to the number originally found in the dataset.
(default) | Create a variety of new emails based on the domain |
| Put a limit on the amount of new emails created. Emails will be recycled after the limit is reached. |
Setting this to True
will leak information about the number of unique emails within each domain. However, these emails will be newly createdones that may not appear in the original data. Always evaluate the risk of a data leak before sharing your transformed data.
obfuscate_emails
: Control whether the overall email looks realistic or follows random patterns.
(default) | Create realistic-looking usernames and emails such as |
| Obfuscate the usernames and emails to create random values such as |
Setting this to False
may result in emails that correspond to real user emails by complete coincidence. If you are worried about creating emails that accidentally correspond to real users, please set this to True
.
Examples
Attributes
After fitting the transformer, you can access the learned values through the attributes.
domain_to_unique_count
: The number of unique email addresses that belong to every domain of the original data.
Note: If you have not selected to enforce unique emails per domain, then the transformer will not compute these values. If you have, then you'll see the count per domain, top or full domain as you specified.
Last updated