Search…
⌃K
Links

Contextual Anonymization

Contextual Anonymization is a novel technique invented at DataCebo to anonymize data types that have particular meanings in your business. It is available as part of the RDT Add-Ons.
Contextual Anonymization combines the ability to anonymize Personal Identifiable Information (PII) while preserving the format and the broader context of the original data.
Text
Contextual Anonymization
Existing Technique: Faking or Mapping
Existing Technique: Generalization or Masking
Preserves
format?
Preserves context?

Existing Anonymization Techniques

Existing techniques preserve either the original format or the broader context of the data, but not both.

Fake data doesn't capture the context

A common anonymization technique is to completely fake new PII values that match the format of the original. We can see how this works for phone number PII data.
However, you might notice that the context of the phone numbers are not preserved. Phone numbers have geographical context: The country and region codes indicate where the caller resides. This geographical context is lost in the fake numbers, so you cannot use the anonymized data if the context matters.

Generalized data doesn't capture the format

The generalization technique explicitly extracts the context. In our example with phone numbers, one approach would be to anonymize the phone numbers by extracting the geographical areas. Alternatively, we could mask the non-contextual digits with the letter 'X'.
Careful generalization can preserve the context but not the original format. The phone numbers don't make sense as actual numbers you can call. You cannot use the anonymized data if the format matters, for example if you want to put it through a QA testing suite that expects valid phone numbers.

Introducing: Contextual Anonymization

Contextual Anonymization is a novel anonymization technique that produces contextually fake data. This preserves both the format of the original data and its context.
In our phone number data, this means that the anonymized data has the same geographical context and the same format of your original phone numbers.
This allows you to use the anonymized data wherever you may use the real data. You don't have to worry about missing the geographical context or incorrect formatting in the anonymized dataset.

Layering other techniques: Mapping

Contextual Anonymization is not an isolated technique. You can layer it with others.
For example, the RDT phone number Add-On provides mapping functionality. Using it, you can contextually anonymize phone numbers in a consistent way: A repeating phone number is consistently mapped to the same contextually fake number.

Try out Contextual Anonymization!

Use the button below to get in touch with the DataCebo team and get a license to use RDT Add-Ons.
Get in touch with Datacebo
If you are using RDTs for research purposes, please inquire about an academic license. As a project born at MIT, we
research.