❖ DPLaplaceNoiser
Last updated
Last updated
Compatibility: numerical
data
The DPLaplaceNoiser
uses differential privacy techniques to add noise to your data. It adds noise to numerical data using the and, if requested, it also uses mechanism to noise missing values.
As a result, the entire column of transformed data will have differential privacy guarantees. (This transformer does not do anything on the reverse transform, as it is not possible to undo the differential privacy noise.)
(required) epsilon
: A float >0 that represents the privacy loss budget you are willing to accommodate.
known_min_value
: A previously-known min value that the data must take. Providing this value will help to conserve the privacy budget and ultimately yield higher fidelity data for the same epsilon value.
The min value should represent prior knowledge of the data. In order to enforce differential privacy, it is critical that the min value is prior knowledge that is not based on any computations of the real data.
(default) None
There is no known minimum value for the data. The transformer will compute one based on the fit data, using some privacy budget
<float>
The transformer will make sure the data will never be less than the value. This will not use up any privacy budget.
known_max_value
: A previously-known max value that the data must take. Providing this value will help to conserve the privacy budget and ultimately yield higher fidelity data for the same epsilon value.
The max value should represent prior knowledge of the data. In order to enforce differential privacy, it is critical that the max value is prior knowledge that is not based on any computations of the real data.
(default) None
There is no known maximum value for the data. The transformer will compute one based on the fit data, using some privacy budget
<float>
The transformer will make sure the data will never be greater than the value. This will not use up any privacy budget.
noise_missing_values
: Add this argument to add noise to the missing values in your data. Noise means that some of the missing values will flip to missing (and vice-versa).
(default) False
Do not add any noise to the missing values. In doing this, we assume that the missing values are not statistically relevant and thus do not require us to use any privacy budget.
True
Use a randomized response mechanism to perturb missing values. This will use some privacy budget.
learn_rounding_scheme
: Add this argument to allow the transformer to learn about rounded values in your dataset.
(default) False
Do not learn or enforce any rounding scheme. When reverse transforming the data, there may be many decimal places present.
True
Learn the rounding rules from the input data. When reverse transforming the data, round the number of digits to match the original.
After fitting the transformer, you can access the learned values through the attributes.
epsilon_breakdown
: A dictionary that stores how the privacy loss budget (epsilon) is broken into the different steps for adding differential privacy (noising the data, noising the min/max boundaries, and noising the missing values). Based on the parameters, not all steps will be needed.
These values represent percentages that add up to 1 (100%). For example 0.5 means that 50% of the epsilon was used up for a particular area.
This transformer uses ε-differentially private mechanisms to add controlled noise to the column of data. It uses the to add noise to the numerical data, and to add noise to missing values.
Noising the numerical data using the . This step is always performed.
Noising the missing values (aka flipping the missing values to non-missing and vice versa) using . This step is performed if the noise_missing_values
parameter is set to True
.