AnonymizedGeoExtractorperforms Contextual Anonymization on phone number data. It transforms phone numbers by extracting geographical context. When reversing the transform, it generates new, fake phone numbers in the correct context.
from rdt_plus.tranfsormers.phone_number import AnonymizedGeoExtractor
age = AnonymizedGeoExtractor()
default_country: If phone number does not have an international country code, provide the country code to use.
No default country. All phone numbers must have international country codes.
match_unique_numbers_per_region: Limit the number of new phone numbers created to the number originally found in the dataset.
Create a variety of new phone numbers based on the geography
Put a limit on the amount of new phone numbers created. Phone numbers will be recycled after the limit is reached.
Setting this to
Truewill leak information about the number of phone numbers within each geographical region. However, these numbers will be newly created numbers that may not appear in the original data. Always evaluate the risk of a data leak before sharing your transformed data.
from rdt_plus.transformers.phone_number import AnonymizedGeoExtractor
# the phone numbers are domestic US phone numbers
age = AnonymizedGeoExtractor(default_country="US")
# the phone numbers are international; place a limit
# on the new phone numbers created
age = AnonymizedGeoExtractor(match_unique_numbers_per_region=True)
Privacy. Extracting geographical information may leak some PII about the phone numbers, especially if you set
True. However, the privacy risk is lowered because the original phone numbers are not present in the transformed data.
Always evaluate the risk of a data leak before sharing your transformed or reverse transformed data.
Quality. Deleting the original phone numbers may reduce the quality, but extracting geographical information provides valuable insight to anyone using the transformed data. If you set
True, then there is additional information about unique and repeating phone numbers.