EqualizedOddsImprovement

Coming soon! This metric is still in development. It will available soon in a future SDMetrics release.

This metric measures whether the synthetic data improves the fairness of making a prediction. Equalized odds particularly looks at the true positive rate (TPR) and false positive rate (FPR) of any predictions you're trying to make.

This metric is relevant when there is a particular value you'd like to predict. It computes the prediction fairness on the real data and then repeats the computation on the synthetic data. The final score indicates whether the synthetic data is improving the fairness over the real data, and the magnitude of that improvement.

Data Compatibility

Numerical: This metric is meant for numerical data
Datetime: This metric works on datetime data by considering the timestamps as continuous values
Categorical: This metric works on categorical data by encoding it as numerical data
Boolean: This metric works on boolean data by encoding it as numerical data

This metric ignores missing values.

Score

(best) 1.0: Synthetic data improves the prediction fairness when compared to real data by the most it possibly can

(baseline) 0.5: Synthetic data yields the same fairness as the real data

(worst) 0.0: Synthetic data decreases the prediction fairness when compared to the real data

Is synthetic data improving my ML classifier's precision? Any score >0.5 indicates that synthetic data is improving the fairness

How does it work?

Equalized Odds: A measurement of fairness

This fairness metric assumes that you have a prediction problem you are interested in computing. Then it measures whether a sensitive attribute will affect the way the predictions get made. If the data is completely fair, then the sensitive attribute will not affect the predictions you make.

Pr(prediction \mid attribute=x) \approx \Pr(prediction \mid attribute=y)

For example, consider a dataset of loan applications. You are interested in predicting whether the loan application will be approved.

In this case, you may want to make sure the prediction is fair based on the applicant's race. That is to say, the race of an applicant should not affect the approval of a loan. For example:

\Pr(approval \mid race=Asian) \approx \Pr(approval \mid race \neq Asian)

Equalized odds [1] is measure that is particularly interested in the true positive rate (TPR) and the false positive rate (FPR) of the predictions.

Algorithm

This metric trains an XGBoost ML classifier [2] on the real training data and then uses it to make predictions on the test data. We then split up these predictions by the sensitive attribute (a), and compute the TPR and FPR. The fairness considers whether the TPR and FPR values are similar, regardless of a.

\text{TPR fairness} = 1- \mid \text{TPR}_{a=True} - \text{TPR}_{a=False} \mid\\\text{FPR fairness} = 1- \mid \text{FPR}_{a=True} - \text{FPR}_{a=False} \mid

\text{Overall fairness} = \min(\text{TPR fairness}, \text{FPR fairness})

Then, we repeat this process using the synthetic data instead of the real data. Now, we combine the fairness scores for the real and synthetic data into a final score by taking the difference and scaling it to the [0, 1] range.

\text{score} = \frac{\text{fairness}_{synthetic} - \text{fairness}_{real}}{2} + 0.5

Usage

Access this metric from the single_table module and use the compute_breakdown method.

from sdmetrics.single_table import EqualizedOddsImprovement

score = EqualizedOddsImprovement.compute_breakdown(
  real_training_data=real_dataset,
  synthetic_data=synthetic_dataset,
  real_validation_data=test_dataset,
  metadata=my_metadata,
  prediction_column_name='loan_approved',
  positive_class_label='True',
  sensitive_column_name='requestor_race',
  sensitive_column_value='Asian',
  classifier='XGBoost',
)

Parameters

(required) real_training_data: A pandas.DataFrame object containing the real data that you used for training your synthesizer. This metric will use this data for training a Binary Classification model.
(required) synthetic_data: A pandas.DataFrame object containing the synthetic data you sampled from your synthesizer. This metric will use this data for training a Binary Classification model
(required) real_validation_data: A pandas.DataFrame object containing a holdout set of real data. This data should not have been used to train your synthesizer. This metric will use this data for evaluating a Binary Classification model
(required) metadata: A metadata dictionary that describes the table of data
(required) prediction_column_name: A string with the name of the column you are interested in predicting. This should be either a categorical or boolean column.
(required) positive_class_label: The value that you are considering to be a positive result, from the perspective of Binary Classification. All other values in this column will be considered negative results.
(required) sensitive_column_name: A string with the name of column that contains the sensitive data, over which we want to compute equalized odds
(required) sensitive_column_value: The value in the sensitive column over which we want to compute equalized odds. The metric will compare this value over all the other values ("one versus rest").
classifier: A string describing the ML algorithm to use when building Binary Classification. Supported options are:
- (default) 'XGBoost': Use gradient boost from the XGBoost library [2]
- Support for additional classifiers is coming in future releases

The compute_breakdown method returns a dictionary containing the overall score, as well as the prediction breakdowns for each of the datasets and each of the sensitive groups.

{
  'score': 0.5071,
  'real_training_data': {
    'equalized_odds': 0.8857,
    'prediction_counts_validation': {
      'Asian=True': {
        'true_positive': 10, # TPR = 0.4
        'false_positive': 5, # FPR = 0.0588
        'true_negative': 80,
        'false_negative': 15
      },
      'Asian=False': {
        'true_positive': 8, # TPR = 0.2857
        'false_positive': 4, # FPR = 0.0506
        'true_negative': 75,
        'false_negative': 20
      }
    }
  },
  'synthetic_data': {
    'equalized_odds': 0.9,
    # same format as the real_training_data
  }
}

FAQs

If I don't have a validation set, can I just use part of the real data?

The purpose of a validation set is to measure the ML classifier's predictions. It's very important that the validation data must never have been used to create a synthesizer/synthetic data. Otherwise the synthetic data can leak patterns of the validation set into your ML classifier.

So if all of your real data was used to create the synthetic data, it is not possible to use any part of it as your validation set.

References

[1] https://en.wikipedia.org/wiki/Equalized_odds

[2] https://xgboost.readthedocs.io/en/stable/

PreviousDisclosureProtectionEstimate NextCategoricalCAP

Last updated 22 days ago