GeoExtractor (+Maps)
The
GeoExtractor
extracts geographical context from the phone numbers. It keeps the original phone numbers so that the same exact numbers can be recovered during the reverse transform.
from rdt_plus.tranfsormers.phone_number import GeoExtractor
ge = GeoExtractor()
default_country
: If phone number does not have an international country code, provide the country code to use.(default) None | No default country. All phone numbers must have international country codes. |
<string> |
from rdt_plus.transformers.phone_number import GeoExtractor
# the phone numbers are domestic US phone numbers
ge = GeoExtractor(default_country="US")
To increase the privacy, we recommend mapping the original phone number data to new numbers before using this transformer.
from rdt_plus.transformers.phone_number import NewNumberMap
# Create a consistent map between original phone numbers and new ones
map = NewNumberMap(default_country='US')
map.fit(data, column=['phone_number'])
mapped_data = map.transform(data)
Now, you can use the
GeoExtractor
on the mapped data, which won't leak the PII of the original phone numbers.ge = GeoExtractor(default_country='US')
ge.fit(mapped_data, column=['phone_number.phone_number'])
Anyone who has access to the
NewNumberMap
object also has access to the original phone numbers. map.get_mapping()
{ '4086581972': '4081123345',
'3106591150': '3105551234',
'4158200978': '4156789100' }
Separating out the mapping from the extractor allows you to control who has access the real phone number data.
Privacy. By itself, the
GeoExtractor
keeps (and leaks) all the original phone numbers. This should only be used if the phone number data is not PII. If you use the GeoExtractor
with the NewNumberMap
, then the chances of leaking PII are lower.Always evaluate the risk of a data leak before sharing your transformed or reverse transformed data.
Quality. The
GeoExtractor
with a mapping produces the highest quality transforms and reverse transforms in the Phone Number Add-On:- All geographical information is preserved
- Information about individual, repeating phone numbers is also preserved
There are two mapping transformers available in the Phone Number Add On. Anyone with access to the map object also has access to the original phone numbers.
The
NewNumberMap
creates new phone numbers and consistently maps them to the original ones.from rdt_plus.transformers.phone_number import NewNumberMap
map = NewNumberMap(default_country='US')
map.fit(data, columns=['phone_number'])
mapped_data = map.transform(data)
# get the consistent mapping (original --> new number)
map.get_mapping('phone_number')
Example output: Mapping between original number and the new ones. The new ones were not in the original dataset, but they are from the same geographical region.
{ '4086581972': '4081123345',
'3106591150': '3105551234',
'4158200978': '4156789100' }
The
ScrambledMap
is another option that may expose more sensitive information. It consistently maps an original phone number with another existing phone number.from rdt_plus.transformers.phone_number import ScrambledMap
map = ScrambledMap(default_country='US')
map.fit(data, columns=['phone_number'])
mapped_data = map.transform(data)
# get the consistent mapping (original --> another original number)
map.get_mapping('phone_number')
Example output: Notice that the new numbers are pulled from the original data.
{ '4081234567': '4082223344',
'4082223344': '4081001000',
'4081001000': '4081234567' }
Last modified 7mo ago