* RandomLocationGenerator

*SDV Enterprise Feature. This feature is available to our licensed users and is not currently in our public library. To learn more about the SDV Enterprise and its extra features, visit our website.

The RandomLocationGenerator creates realistic, worldwide addresses. It transforms the real data by dropping all the address-related columns. Then when reverse transforming, it generates random, realistic locations from around the world. Use this transformer when you want to completely anonymize your address data.

from rdt.transformers.address import RandomLocationGenerator

transformer = RandomLocationGenerator(locales=['en_US'])

Parameters

locales: An optional list of locales to use when generating addresses. All addresses will be chosen from the list of available countries with the provided languages.

(default) ["en_US"]

Create locations from the US in English.

<list>

Create data from the list of countries specified in the languages specified. For example ["en_US", "fr_CA"] creates a mix of locations from the US in English and from Canada in French.

missing_value_generation: Add this argument to determine how to recreate missing values during the reverse transform phase

(default) 'random'

Randomly assign missing values in roughly the same proportion as the original data.

None

Do not recreate missing values.

Examples

This transformer takes multiple columns as input. Make sure that each column involved in your address is a supported sdtype such as city, state and postcode. For more information, see the suported sdtypes list.

from rdt.transformers.address import RandomLocationGenerator

transformer = RandomLocationGenerator(
    locales=['en_US', 'fr_CA'],
    missing_value_generation='random'
)

# in the hypertransformer, ensure that each column has a supported sdtype
ht.update_sdtypes(column_name_to_sdtype={
    'addr_1': 'street_address',
    'city_name': 'city',
    'state': 'state_abbr',
})

# in the hypertransformer, assign set of columns to your transformer
ht.update_transformers(column_name_to_transformer={
    ('addr_1', 'city_name', 'state'): transformer
})

FAQs

Will the generated locations be real places?

The general region is guaranteed to be real location from anywhere in the world. That is, the combination of the city, administrative unit, country, and post code can be found on a map. For example: Boston, MA USA 02116.

However, anything more precise than that will be fake, including street address and secondary address. For example: 123 Main St., Suite #204 is completely made up and not necessarily located in Boston. This is by design to help protect the privacy of real homes and businesses.

Can I limit the regions to the ones in my original data?

This transformer is designed to create random locations from anywhere in the countries that you provide. For example, if you provide 'en_US', then the transformer will create addresses from anywhere in the US such as California, New York, Massachusetts, etc. -- even if your original data did not have all these locations.

If you'd like to limit the regions based on the original data, use the RegionalAnonymizer.

Worldwide, regional data is provided by www.geonames.org.

Last updated