Defining your metric
Use this guide to define your metric. It's important to think through the abstractions and functionality of your metric before adding it.
All metrics in this library are model-agnostic. Anyone who wants to use your metric should already have:
- 1.A real dataset
- 2.A synthetic dataset, which could be created using any model
The base version of your metric takes in real and synthetic data with the smallest possible unit of data. The base metric is a class with a
computemethod. The method takes in the minimal unit of real data, synthetic data and any other keyword args you want to add. It returns a score, represented as floating point value.
from sdmetrics.column_pairs import YourMetricName
In many cases, you may want to iterate through the entire dataset to apply the base metric to different columns, pairs of columns, tables, etc. You can write a convenience method called
apply_to_tablethat performs this iteration.
This method takes in the full real data, synthetic data, keyword args and metadata. According to the metadata, you can determine where to apply the base metric. The metric returns a dictionary of results, keyed by the base unit.
('column_1', 'column_2'): 0.1234,
('column_1', 'column_3'): 0.5678,
('column_2', 'column_3'): 0.9012,
Every metric includes a detailed description in these docs. It can be helpful to think through this before implementation.
- Metrics vs. parameters. If your metric is extremely similar to another, consider combining them and introducing a parameter instead.
- External dependencies. If your metric introduces new dependencies, consider whether they are necessary. New dependencies make it harder to maintain the overall SDMetrics package and may leave the software vulnerable if the external library is not being regularly updated or used.
- Determinism. If your metric is not deterministic, explore why this is the case. If the score varies highly between successive runs, it may be hard to interpret your metric.