Metrics in Beta
Our goal is to provide high quality, mathematically sound and vetted metrics in the SDMetrics library, and we recognize that synthetic data is a new space undergoing active research. So to encourage discussion and collaboration, we've introduced a metrics in Beta section for anyone wanting to explore with us.
We envision many new metrics may start out in Beta before being validated and adopted by the wider community.
A metric can be experimental for many reasons, including the ones below.
The mathematical concepts are too new. Synthetic data is an area of active research. The research might be so new that it would benefit from more validation through the open source community before wider adoption.
The metric isn't robust. Some metrics may not be reliable for every dataset. They may fluctuate widely based on built-in randomness or they may heavily depend on external algorithms that aren't optimized for every dataset.
The interpretation isn't clear. Metric scores should have a clear interpretation. Even if a metric uses a well-known mathematical method, it may lack clarity in the context of synthetic data. It may be possible to "trick" the metric or there may be multiple, conflicting interpretations for it.
We can upgrade some metrics from Beta after addressing the underlying concern. For example:
- If the metric has multiple interpretations it may make sense to split it into 2 metrics (one for each interpretation)
- If the metric is highly variable based on an external algorithm, there may be other basic, statistics that are not as variable.
You are welcome to start a discussion about a metric in Beta by raising an issue on the SDMetrics GitHub or by joining the SDV Slack.