AnonymizedFaker
Last updated
Last updated
Compatibility: id
and pii
data
The AnonymizedFaker
creates anonymized text belonging to specific contexts or rulesets. When transforming the data, it simply removes the column. When reversing the transform, it anonymizes the column by creating completely new, fake data at random using the Python Faker library.
You can specify the exact faker method to use for more realistic data.
provider_name
: The name of the provider to use from the Faker library.
(default) None
<string>
function_name
: The name of the function to use within the Faker provider.
(default) 'lexify'
<string>
Together, the provider_name
and function_name
parameters specify exactly how to create fake data. Some common values are:
A full address: provider_name="address", function_name="address"
A basic bank account number: provider_name="bank", function_name="bban"
A full credit card number: provider_name="credit_card", function_name="credit_card_number"
Latitude/longitude coordinates: provider_name="geo", function_name="local_latlng"
A phone number: provider_name="phone_number", function_name="phone_number"
To browse for more options, visit the Faker library's docs.
function_kwargs
: Optional parameters to pass into the function that you're specifying to create Fake data.
(default) None
Do not specify any additional parameters
<dictionary>
locales
: An optional list of locales to use when generating the Fake data.
(default) None
Use the default locale, which is usually set to the country you are in.
<list>
Setting a locale might leak information about the original data. Anyone with access to the anonymized data will be able to tell which countries and locales are included in the original data .
cardinality_rule
: Control the number of PII values that will be created
(default) None
Do not impose any rules. Any number of unique PII can be generated.
'unique'
The generated data should not contain any repeating values. Note: This option may limit the amount of data that you can create.
'match'
Learn the number of unique values from the fit data and ensure that the generated data contains the same number. These may be repeated.
Deprecated enforce_uniqueness
: Use cardinality_rule
instead
missing_value_generation
: Add this argument to determine how to recreate missing values during the reverse transform phase
(default) 'random'
Randomly assign missing values in roughly the same proportion as the original data.
None
Do not recreate missing values.
Use the from Faker, which capable of creating random text.
Use the provider for a specific context, for example or .
Use the to create random 4-character text.
Use the function from the specified provider to generate fake data. For example, from the address provider or from the bank provider.
Additional parameters to add. These are unique to the function name and should be represented as a dictionary.
For example for the banking function, you can specify: {"length": 11, "primary": True}
.
Create data from the list of locales. These are specified as strings representing the language and country from Faker.
For example [
,
]
.