# Welcome to SDGym!

The **Synthetic Data Gym** (SDGym) is a Python library for benchmarking different synthetic data generators. For example you can compare synthesizers that use classical statistics versus those that use deep learning.

<figure><img src="https://3464836953-files.gitbook.io/~/files/v0/b/gitbook-x-prod.appspot.com/o/spaces%2FLLx9fQwQGgVNyQnbyMBb%2Fuploads%2Fw2YIR5OTGlcIfmt7dQf6%2Fsdgym-synthetic-data-gym_Aug%2004%202025.png?alt=media&#x26;token=47a85436-065a-4d1f-8765-9ec6c1c61f16" alt=""><figcaption></figcaption></figure>

## Why benchmarking?

The past few years have seen an increasing amount of research in the synthetic data space. With such a variety of models available, we feel that it's important to validate new techniques with a proper framework in place.

We built SDGym as flexible and customizable framework that you can use for a thorough analysis of synthetic data. This will give you the confidence when publishing your results and incorporating synthetic data into your projects.

### 🧮 Use multiple datasets for a reliable benchmark

If your goal is to test a robust synthetic data model, relying on a single dataset may not give you much confidence. The SDGym library comes with a **variety of publicly available datasets** that are immediately for use.

This is helpful for detecting errors in your synthesizer as well as identifying its strengths and weaknesses.

### ⚙️ Customize your benchmarking framework

We recognize that benchmarking may not be a one-size-fits-all solution. Our framework allows you to input your own **custom datasets** if you are working with private data.

You can also **customize your evaluation metrics** by selecting any metric in the [SDMetrics library](https://docs.sdv.dev/sdmetrics/).

## Get started

[Install](https://docs.sdv.dev/sdgym/installation) the sdgym software and kick off a benchmarking run in only 2 lines of code!

```python
import sdgym

results = sdgym.benchmark_single_table()
```

## Owned & Maintained by DataCebo

The SDGym library is a part of the [Synthetic Data Vault Project](https://sdv.dev/), first created at MIT's [Data to AI Lab](http://dai.lids.mit.edu/) in 2016. After 4 years of research and traction with enterprise, we created DataCebo in 2020 with the goal of growing the project.

Today, [DataCebo](https://datacebo.com/) is the proud developer of the SDV, the largest ecosystem for synthetic data generation & evaluation.
