❖ DPDiscreteECDFNormalizer
Last updated
Last updated
Compatibility: categorical
data
The DPDiscreteECDFNormalizer
uses differential privacy techniques to normalize your categorical values into a numerical column that is uniform or normal. To do this, estimates the and adds to your data. (On the reverse transform, this transformer brings the data back into the original category values.)
(required) epsilon
: A float >0 that represents the privacy loss budget you are willing to accommodate.
order_by
: Apply a prescribed ordering scheme. Use this if the discrete categorical values have an order.
(default) None
Do not apply a particular order
'numerical_value'
If the data is represented by integers or floats, order by those values
'alphabetical'
If the data is represented by strings, order them alphabetically.
normalized_distribution
: Add this argument to control the shape of the transformed data. Choose whatever is easiest for your downstream use case.
(default) 'uniform'
Transform the data into a uniform distribution, between 0 and 1.
'norm'
Transform the data into a standard normal distribution, aka a bell curve with mean of 0 and standard deviation of 1.
This transformer creates a bar chart of your data and uses it compute an . The empirical CDF distribution can be used to normalize your data into a different shape (uniform or normal) using the .
Throughout the process, the uses uses ε-differentially private mechanisms to add controlled noise to the frequencies of each category value. For more information about this, see the .
The privacy loss budget is used when saving the frequencies of each category value. This uses the using the .