* DomainBasedAnonymizer

*SDV Enterprise Feature. This feature is available to our licensed users and is not currently in our public library. To learn more about the SDV Enterprise and its extra features, visit our website.

The DomainBasedAnonymizer performs Contextual Anonymization on email data. It transforms emails by extracting their domains. When reversing the transform, it generates new, fake emails with the correct domains.

from rdt.transformers.email import DomainBasedAnonymizer

transformer = DomainBasedAnonymizer(obfuscate_emails=True)

Parameters

extracted_domain: Which parts of the overall email domain to extract during the transformation phase

enforce_unique_count: Limit the number of new emails created to the number originally found in the dataset.

Setting this to True will leak information about the number of unique emails within each domain. However, these emails will be newly createdones that may not appear in the original data. Always evaluate the risk of a data leak before sharing your transformed data.

obfuscate_emails: Control whether the overall email looks realistic or follows random patterns.

Setting this to False may result in emails that correspond to real user emails by complete coincidence. If you are worried about creating emails that accidentally correspond to real users, please set this to True.

Examples

from rdt.transformers.email import DomainBasedAnonymizer

transformer = DomainBasedAnonymizer(
    extracted_domain='top',
    enforce_unique_count=False,
    obfuscate_emails=True
)

Attributes

After fitting the transformer, you can access the learned values through the attributes.

domain_to_unique_count: The number of unique email addresses that belong to every domain of the original data.

>>> transformer.domain_to_unique_count
{
    'datacebo.com': 15,
    'gmail.com': 103,
    'yahoo.com': 10,
    'sdv.dev': 14
}

Note: If you have not selected to enforce unique emails per domain, then the transformer will not compute these values. If you have, then you'll see the count per domain, top or full domain as you specified.

Last updated