The DPResponseRandomizer uses differential privacy techniques to add noise to your data. It adds noise to categorical data using the mechanism.
As a result, the entire column of transformed data will have differential privacy guarantees. (This transformer does not do anything on the reverse transform, as it is not possible to undo the differential privacy noise.)
from rdt.transformers.categorical import DPResponseRandomizer
transformer = DPResponseRandomizer(epsilon=1.5)
Parameters
(required) epsilon: A float >0 that represents the privacy loss budget you are willing to accommodate.
I chose my privacy loss budget (epsilon)? The value of epsilon is a measure of how much risk you're willing to take on when it comes to privacy.
Values in the 0-1 range indicate that you are not willing to take on too much risk. As a result, the synthetic data will have strong privacy guarantees — potentially at the expense of data quality.
Values in the 2-10 range indicate that you're willing to accept some privacy risk in order to preserve more data quality.
FAQ
What is the difference between the DP ResponseRandomizer and the DP Weighted ResponseRandomizer?
The regular ResponseRandomizer applies the noise equally to all possibly category values. This is an efficient way to use up the privacy budget, and it works best if the category values all appear with about equal frequency.
The weighed ResponseRandomizer applies noise unequally. It uses up some privacy budget to compute the frequency of each category value (using differential privacy). Then, it uses the frequencies for weighted randomization. This works best if the category values are highly imbalanced in your data.
Can I share the data after applying this? What are the differential privacy guarantees?
Differential privacy controls the amount of influence a single data point can have over the final, transformed column. After applying the transformer to this column, the entire column provides differential privacy guarantees, so you should be able to share it as well as any statistics about it (min, max, mean, etc.).
Please note that this transformer only applies differential privacy to the individual column. It does not provide differential privacy guarantees if you'd like to share multiple columns at a time. For that, we recommend using a differentially private synthesizer that can handle many columns at once.
Both transformers use the mechanism to add noise to categorical data. This means that some values in your column will randomly be turned into other values.
❖ SDV Enterprise Bundle. This feature is available as part of the Differential Privacy Bundle, an optional add-on to SDV Enterprise. For more information, please visit the page. Coming soon!