AnonymizedFaker
Compatibility: id
and pii
data
The AnonymizedFaker
creates anonymized text belonging to specific contexts or rulesets. When transforming the data, it simply removes the column. When reversing the transform, it anonymizes the column by creating completely new, fake data at random using the Python Faker library.

from rdt.transformers.pii import AnonymizedFaker
transformer = AnonymizedFaker()
You can specify the exact faker method to use for more realistic data.
Parameters
provider_name
: The name of the provider to use from the Faker library.
(default) None
Use the BaseProvider from Faker, which capable of creating random text.
function_name
: The name of the function to use within the Faker provider.
(default) 'lexify'
Use the lexify method to create random 4-character text.
<string>
Use the function from the specified provider to generate fake data. For example, "street_address"
from the address provider or "swift"
from the bank provider.
function_kwargs
: Optional parameters to pass into the function that you're specifying to create Fake data.
(default) None
Do not specify any additional parameters
<dictionary>
Additional parameters to add. These are unique to the function name and should be represented as a dictionary.
For example for the banking "swift"
function, you can specify: {"length": 11, "primary": True}
.
locales
: An optional list of locales to use when generating the Fake data.
Setting a locale might leak information about the original data. Anyone with access to the anonymized data will be able to tell which countries and locales are included in the original data .
cardinality_rule
: Control the number of PII values that will be created
(default) None
Do not impose any rules. Any number of unique PII can be generated.
'unique'
The generated data should not contain any repeating values. Note: This option may limit the amount of data that you can create.
'match'
Learn the number of unique values from the fit data and ensure that the generated data contains the same number. These may be repeated.
(deprecated) enforce_uniqueness
: Use cardinality_rule
instead.
missing_value_generation
: Add this argument to determine how to recreate missing values during the reverse transform phase
(default) 'random'
Randomly assign missing values in roughly the same proportion as the original data.
None
Do not recreate missing values.
Examples
from rdt.transformers.pii import AnonymizedFaker
# create more realistic-looking data by specifying a provider and function
transformer = AnonymizedFaker(
provider_name="person",
function_name="name",
cardinality_rule='match'
)

FAQs
Last updated