LogoLogo
GitHubSlackDataCebo
  • SDMetrics
  • Getting Started
    • Installation
    • Quickstart
    • Metadata
      • Single Table Metadata
      • Multi Table Metadata
      • Sequential Metadata
  • Reports
    • Quality Report
      • What's included?
      • Single Table API
      • Multi Table API
    • Diagnostic Report
      • What's included?
      • Single Table API
      • Multi Table API
    • Other Reports
    • Visualization Utilities
  • Metrics
    • Diagnostic Metrics
      • BoundaryAdherence
      • CardinalityBoundaryAdherence
      • CategoryAdherence
      • KeyUniqueness
      • ReferentialIntegrity
      • TableStructure
    • Quality Metrics
      • CardinalityShapeSimilarity
      • CategoryCoverage
      • ContingencySimilarity
      • CorrelationSimilarity
      • KSComplement
      • MissingValueSimilarity
      • RangeCoverage
      • SequenceLengthSimilarity
      • StatisticMSAS
      • StatisticSimilarity
      • TVComplement
    • Privacy Metrics
      • DCRBaselineProtection
      • DCROverfittingProtection
      • DisclosureProtection
      • DisclosureProtectionEstimate
      • CategoricalCAP
    • ML Augmentation Metrics
      • BinaryClassifierPrecisionEfficacy
      • BinaryClassifierRecallEfficacy
    • Metrics in Beta
      • CSTest
      • Data Likelihood
        • BNLikelihood
        • BNLogLikelihood
        • GMLikelihood
      • Detection: Sequential
      • Detection: Single Table
      • InterRowMSAS
      • ML Efficacy: Sequential
      • ML Efficacy: Single Table
        • Binary Classification
        • Multiclass Classification
        • Regression
      • NewRowSynthesis
      • * OutlierCoverage
      • Privacy Against Inference
      • * SmoothnessSimilarity
  • Resources
    • Citation
    • Contributions
      • Defining your metric
      • Development
      • Release FAQs
    • Enterprise
      • Domain Specific Reports
    • Blog
Powered by GitBook
On this page
  • Data Compatibility
  • Score
  • How does it work?
  • Usage
  • FAQ
  1. Metrics
  2. Diagnostic Metrics

KeyUniqueness

PreviousCategoryAdherenceNextReferentialIntegrity

Last updated 1 month ago

This metric measures whether the keys in a particular dataset are unique. We expect that certain types of keys, such as primary keys, are always unique in order to be valid.

Data Compatibility

  • ID : This metric is meant for ID data

  • Other : This metric can work with any other type of semantic data that is used in place of an ID, such as a natural key like email

Score

(best) 1.0: All of the key values in the synthetic data are unique

(worst) 0.0: None of the key values in the synthetic data are unique

How does it work?

This metric measures how many values in the synthetic data, s, are duplicates, meaning that there is another value that is exactly the same. Call this set Ds. The score is the proportion of values that are not duplicates.

score=1−∣Ds∣∣s∣score = 1 - \frac{|D_s|}{|s|}score=1−∣s∣∣Ds​∣​

Usage

To manually run this metric, access the single_column module and use the compute method.

from sdmetrics.single_column import KeyUniqueness

KeyUniqueness.compute(
    real_data=real_table['primary_key_name'],
    synthetic_data=synthetic_table['primary_key_name']
)

Parameters

  • (required) real_data: A pandas.Series object with the column of real data

  • (required) synthetic_data: A pandas.Series object with the column of synthetic data

FAQ

Should the score always be 1?

If you are running this score on a primary key, then the score should always be 1. Primary keys are expected to be unique.

Does this metric use the real data?

This metric checks to see if the real data also has unique values and alerts you if this is not the case. However, the final score is only based on the synthetic data.

Recommended Usage: The applies this metric to applicable keys (primary and alternate keys).

If you are running this score on a foreign key, then the score may not be 1, as foreign keys are allowed to repeat. For foreign keys, we recommend using the metric instead.

Diagnostic Report
ReferentialIntegrity