SDMetrics

Synthetic Data Metrics (SDMetrics) is an open source Python library for evaluating tabular synthetic data. Compare synthetic data against real data using a variety metrics, generate visual reports and share them with your team.

Flexible, Intuitive Evaluation

The SDMetrics library is model-agnostic, meaning you can use it with synthetic data created by any model at any time.

⚖️ Choose from a variety of metrics

You'll find many different types of metrics for evaluating synthetic data. SDMetrics docs explain relevant mathematical concepts and help you decide the best ones to apply.

Synthetic data can be measured in two ways. Much of the focus has been on measuring statistical data differences between the real and synthetic data, such as quality measures. But this is not enough. Synthetic data needs to provide a return-on-investment (ROI) for the task it is ultimately meant to accomplish — whether it's software testing, machine learning development, or more. When possible, it's important to include metrics that measure ROI in your evaluation.

SDMetrics includes metrics for statistical data differences as well as for the ultimate ROI for different tasks. The two may or may not correlate.

📊 Visualize & share your results with reports

Easily generate reports for your project. Reports focus on a particular aspect of synthetic data, for example data quality. Use them to drill down visually until you get answers.

This is an example a visualization from the SDMetrics Quality Report.

📚 Participate in cutting edge research

The SDMetrics library welcomes contributions from active research areas! Browse our Metrics in Beta and experiment with cutting edge methods to evaluate your data.

This is an example illustrating the DisclosureProtection metric that measures privacy.

Owned & Maintained by DataCebo

The SDMetrics library is a part of the Synthetic Data Vault Project, first created at MIT's Data to AI Lab in 2016. After 4 years of research and traction with enterprise, we created DataCebo in 2020 with the goal of growing the project.

Today, DataCebo is the proud developer of the SDV, the largest ecosystem for synthetic data generation & evaluation.

Last updated