Defining your metric

Use this guide to define your metric. It's important to think through the abstractions and functionality of your metric before adding it.

When you're ready to add your metric, please file an issue with the relevant details. We recommend waiting for feedback before you begin implementing your metric.


All metrics in this library are model-agnostic. Anyone who wants to use your metric should already have:

  1. A real dataset

  2. A synthetic dataset, which could be created using any model

Base Metric

The base version of your metric takes in real and synthetic data with the smallest possible unit of data. The base metric is a class with a compute method. The method takes in the minimal unit of real data, synthetic data and any other keyword args you want to add. It returns a score, represented as floating point value.

from sdmetrics.column_pairs import YourMetricName

  real_data[['column_1', 'column_2']],
  synthetic_data[['column_1', 'column_2']],
  kwarg1= ...

(Optional) Iterative Application

In many cases, you may want to iterate through the entire dataset to apply the base metric to different columns, pairs of columns, tables, etc. You can write a convenience method called apply_to_table that performs this iteration.

This method takes in the full real data, synthetic data, keyword args and metadata. According to the metadata, you can determine where to apply the base metric. The metric returns a dictionary of results, keyed by the base unit.


  ('column_1', 'column_2'): 0.1234,
  ('column_1', 'column_3'): 0.5678,
  ('column_2', 'column_3'): 0.9012,

You can write multiple levels of iteration. For example, it may be possible to run your metric on multi table datasets too. In this case, the breakdown is further keyed on the table name.

    'users': {
        ('column_1', 'column_2'): 0.1234,
        ('column_1', 'column_3'): 0.5678
    'sessions': ...

Metric Description

Every metric includes a detailed description in these docs. It can be helpful to think through this before implementation.

Other Considerations

  • Metrics vs. parameters. If your metric is extremely similar to another, consider combining them and introducing a parameter instead.

  • External dependencies. If your metric introduces new dependencies, consider whether they are necessary. New dependencies make it harder to maintain the overall SDMetrics package and may leave the software vulnerable if the external library is not being regularly updated or used.

  • Determinism. If your metric is not deterministic, explore why this is the case. If the score varies highly between successive runs, it may be hard to interpret your metric.

Last updated