❖ DPTimestampLaplaceNoiser

Compatibility: datetime data

❖ SDV Enterprise Bundle. This feature is available as part of the Differential Privacy Bundle, an optional add-on to SDV Enterprise. For more information, please visit the Differential Privacy Bundle page. Coming soon!

The DPTimestampLaplaceNoiser uses differential privacy techniques to add noise to your data. It adds noise to datetime data using the Laplace mechanism and, if requested, it also uses randomized response mechanism to noise missing values.

As a result, the entire column of transformed data will have differential privacy guarantees. (This transformer does not do anything on the reverse transform, as it is not possible to undo the differential privacy noise.)

from rdt.transformers.datetime import DPTimestampLaplaceNoiser

transformer = DPTimestampLaplaceNoiser(epsilon=0.5)

Parameters

(required) epsilon: A float >0 that represents the privacy loss budget you are willing to accommodate.

How should I chose my privacy loss budget (epsilon)? The value of epsilon is a measure of how much risk you're willing to take on when it comes to privacy.

Values in the 0-1 range indicate that you are not willing to take on too much risk. As a result, the synthetic data will have strong privacy guarantees — potentially at the expense of data quality.
Values in the 2-10 range indicate that you're willing to accept some privacy risk in order to preserve more data quality.

known_min_value: A previously-known min value that the data must take. Providing this value will help to conserve the privacy budget and ultimately yield higher fidelity data for the same epsilon value.

The min value should represent prior knowledge of the data. In order to enforce differential privacy, it is critical that the min value is prior knowledge that is not based on any computations of the real data.

(default) None

There is no known minimum value for the data. The transformer will compute one based on the fit data, using some privacy budget

<pd.Timestamp>

The transformer will make sure the data will never be earlier than the value. This will not use up any privacy budget.

known_max_value: A previously-known max value that the data must take. Providing this value will help to conserve the privacy budget and ultimately yield higher fidelity data for the same epsilon value.

The max value should represent prior knowledge of the data. In order to enforce differential privacy, it is critical that the max value is prior knowledge that is not based on any computations of the real data.

(default) None

There is no known maximum value for the data. The transformer will compute one based on the fit data, using some privacy budget

<pd.Timestamp>

The transformer will make sure the data will never be later than the value. This will not use up any privacy budget.

noise_missing_values: Add this argument to add noise to the missing values in your data. Noise means that some of the missing values will flip to missing (and vice-versa).

(default) False

Do not add any noise to the missing values. In doing this, we assume that the missing values are not statistically relevant and thus do not require us to use any privacy budget.

True

Use a randomized response mechanism to perturb missing values. This will use some privacy budget.

datetime_format: Add this argument to tell the transformer how to read your datetime column if it's in a specific format that isn't easy to identify.

(default) None

Format detection isn't needed. This may be because your data is represented by pd.datetime objects. If your data is present as a string, please provide a format.

<string>

Read the format according to instructions in the <string>. For eg. to represent a datetime like "Feb 15, 2022 10:23:45 AM", you can use the format string: "%b %d, %Y %I:%M:%S %p". For more info, see Python's strftime module↗.

Attributes

After fitting the transformer, you can access the learned values through the attributes.

epsilon_breakdown: A dictionary that stores how the privacy loss budget (epsilon) is broken into the different steps for adding differential privacy (noising the data, noising the min/max boundaries, and noising the missing values). Based on the parameters, not all steps will be needed.

These values represent percentages that add up to 1 (100%). For example 0.5 means that 50% of the epsilon was used up for a particular area.

>>> transformer.epsilon_breakdown
{
    'epsilon_data': 0.5,
    'epsilon_boundaries': 0.3,
    'epsilon_missing_values': 0.2
}

FAQ

Which algorithms does this transformer use?

This transformer converts your datetime column to Unix timestamps. Then, it uses ε-differentially private mechanisms to add controlled noise to the column of data. It uses the Laplace mechanism to add noise to the numerical timestamps, and Randomized response to add noise to missing values.

How is the privacy loss budget (ε) used?

The privacy loss budget is used during 3 possible phases of the transformation:

Computing differentially private min/max values from the data, so as not to reveal the actual min or max value that the data contains. This step is required if the known_min_value and known_max_value are not provided.
Noising the timestamps using the Laplacian mechanism. This step is always performed.
Noising the missing values (aka flipping the missing values to non-missing and vice versa) using Randomized response. This step is performed if the noise_missing_values parameter is set to True.

To see what the final breakdown is, use the epsilon_breakdown parameter.

PreviousUnixTimestampEncoder NextID

Last updated 2 months ago