LogoLogo
GitHubSlackDataCebo
  • RDT: Reversible Data Transforms
  • Getting Started
    • Installation
    • Quickstart
  • Usage
    • Basic Concepts
    • HyperTransformer
      • Preparation
      • Configuration
      • Transformation
  • Transformers Glossary
    • Numerical
      • ClusterBasedNormalizer
      • FloatFormatter
      • GaussianNormalizer
      • LogScaler
      • LogitScaler
      • * OutlierEncoder
      • ❖ DPECDFNormalizer
      • ❖ DPLaplaceNoiser
      • ❖ ECDFNormalizer
      • ❖ XGaussianNormalizer
    • Categorical
      • LabelEncoder
      • OrderedLabelEncoder
      • FrequencyEncoder
      • OneHotEncoder
      • OrderedUniformEncoder
      • UniformEncoder
      • BinaryEncoder
      • ❖ DPDiscreteECDFNormalizer
      • ❖ DPResponseRandomizer
      • ❖ DPWeightedResponseRandomizer
    • Datetime
      • OptimizedTimestampEncoder
      • UnixTimestampEncoder
      • ❖ DPTimestampLaplaceNoiser
    • ID
      • AnonymizedFaker
      • IndexGenerator
      • RegexGenerator
      • Treat IDs as categorical labels
    • Generic PII Anonymization
      • AnonymizedFaker
      • PseudoAnonymizedFaker
    • * Deep Data Understanding
      • * Address
        • * RandomLocationGenerator
        • * RegionalAnonymizer
      • * Email
        • * DomainBasedAnonymizer
        • * DomainBasedMapper
        • * DomainExtractor
      • * GPS Coordinates
        • * RandomLocationGenerator
        • * GPSNoiser
        • * MetroAreaAnonymizer
      • * Phone Number
        • * AnonymizedGeoExtractor
        • * NewNumberMapper
        • * GeoExtractor
  • Resources
    • Use Cases
      • Contextual Anonymization
      • Differential Privacy
      • Statistical Preprocessing
    • For Businesses
    • For Developers
Powered by GitBook
On this page
  • Parameters
  • Examples
  • FAQs
  1. Transformers Glossary
  2. * Deep Data Understanding
  3. * Address

* RandomLocationGenerator

Previous* AddressNext* RegionalAnonymizer

Last updated 6 months ago

The RandomLocationGenerator creates realistic, worldwide addresses. It transforms the real data by dropping all the address-related columns. Then when reverse transforming, it generates random, realistic locations from around the world. Use this transformer when you want to completely anonymize your address data.

from rdt.transformers.address import RandomLocationGenerator

transformer = RandomLocationGenerator(locales=['en_US'])

Parameters

locales: An optional list of locales to use when generating addresses. All addresses will be chosen from the list of available countries with the provided languages.

(default) ["en_US"]

Create locations from the US in English.

<list>

missing_value_generation: Add this argument to determine how to recreate missing values during the reverse transform phase

(default) 'random'

Randomly assign missing values in roughly the same proportion as the original data.

None

Do not recreate missing values.

Examples

from rdt.transformers.address import RandomLocationGenerator

transformer = RandomLocationGenerator(
    locales=['en_US', 'fr_CA'],
    missing_value_generation='random'
)

# in the hypertransformer, ensure that each column has a supported sdtype
ht.update_sdtypes(column_name_to_sdtype={
    'addr_1': 'street_address',
    'city_name': 'city',
    'state': 'state_abbr',
})

# in the hypertransformer, assign set of columns to your transformer
ht.update_transformers(column_name_to_transformer={
    ('addr_1', 'city_name', 'state'): transformer
})

FAQs

Will the generated locations be real places?

The general region is guaranteed to be real location from anywhere in the world. That is, the combination of the city, administrative unit, country, and post code can be found on a map. For example: Boston, MA USA 02116.

However, anything more precise than that will be fake, including street address and secondary address. For example: 123 Main St., Suite #204 is completely made up and not necessarily located in Boston. This is by design to help protect the privacy of real homes and businesses.

Can I limit the regions to the ones in my original data?

This transformer is designed to create random locations from anywhere in the countries that you provide. For example, if you provide 'en_US', then the transformer will create addresses from anywhere in the US such as California, New York, Massachusetts, etc. -- even if your original data did not have all these locations.

Create data from the list of countries specified in the languages specified. For example [, ] creates a mix of locations from the US in English and from Canada in French.

This transformer takes multiple columns as input. Make sure that each column involved in your address is a supported sdtype such as city, state and postcode. For more information, see the .

If you'd like to limit the regions based on the original data, use the .

Worldwide, regional data is provided by .

RegionalAnonymizer
www.geonames.org
"en_US"
"fr_CA"

*SDV Enterprise Feature. This feature is available to our licensed users and is not currently in our public library. For more information, visit our page to .

Explore SDV
suported sdtypes list