Synthetic Data Metrics (SDMetrics) is an open source Python library for evaluating tabular synthetic data. Compare synthetic data against real data using a variety metrics, generate visual reports and share them with your team.

Flexible, Intuitive Evaluation

The SDMetrics library is model-agnostic, meaning you can use it with synthetic data created by any model at any time.

📊 Visualize & share your results with reports

Easily generate reports for your project. Reports focus on a particular aspect of synthetic data, for example data quality. Use them to drill down visually until you get answers.
This is an example a visualization from the SDMetrics Quality Report.
We are also here to help with custom reports tailored to your enterprise needs.

⚖️ Choose from a variety of metrics

In our Metrics Glossary, you'll find all different metrics for evaluating synthetic data. SDMetrics docs explain relevant mathematical concepts and help you decide the best ones to apply.
This is an example illustrating the CategoricalCAP metric that measures privacy.

📚 Participate in cutting edge research

The SDMetrics library welcomes contributions from active research areas! Browse our Metrics in Beta and experiment with cutting edge methods to evaluate your data.

Owned & Maintained by DataCebo

The SDMetrics library is a part of the Synthetic Data Vault Project, first created at MIT's Data to AI Lab in 2016. After 4 years of research and traction with enterprise, we created DataCebo in 2020 with the goal of growing the project.
Today, DataCebo is the proud developer of the SDV, the largest ecosystem for synthetic data generation & evaluation.