SDGym
GitHubSlackDataCebo
  • Welcome to SDGym!
  • Installation
  • Benchmarking
    • Running a Benchmark
    • Interpreting Results
  • Customization
    • Synthesizers
      • SDV Synthesizers
      • Basic Synthesizers
      • 3rd Party Synthesizers
      • Custom Synthesizers
    • Datasets
      • Public SDV Datasets
      • Custom Datasets
    • AWS Integration
  • Resources
    • Metadata
Powered by GitBook

© Copyright 2023, DataCebo, Inc.

On this page
  • Why benchmarking?
  • 🧮 Use multiple datasets for a reliable benchmark
  • ⚙️ Customize your benchmarking framework
  • Get started
  • Owned & Maintained by DataCebo

Welcome to SDGym!

Last updated 1 year ago

The Synthetic Data Gym (SDGym) is a Python library for benchmarking different synthetic data generators. For example you can compare synthesizers that use classical statistics versus those that use deep learning.

Why benchmarking?

The past few years have seen an increasing amount of research in the synthetic data space. With such a variety of models available, we feel that it's important to validate new techniques with a proper framework in place.

We built SDGym as flexible and customizable framework that you can use for a thorough analysis of synthetic data. This will give you the confidence when publishing your results and incorporating synthetic data into your projects.

🧮 Use multiple datasets for a reliable benchmark

If your goal is to test a robust synthetic data model, relying on a single dataset may not give you much confidence. The SDGym library comes with a variety of publicly available datasets that are immediately for use.

This is helpful for detecting errors in your synthesizer as well as identifying its strengths and weaknesses.

⚙️ Customize your benchmarking framework

We recognize that benchmarking may not be a one-size-fits-all solution. Our framework allows you to input your own custom datasets if you are working with private data.

You can also customize your evaluation metrics by selecting any metric in the SDMetrics library.

Get started

Install the sdgym software and kick off a benchmarking run in only 2 lines of code!

import sdgym

results = sdgym.benchmark_single_table()

Owned & Maintained by DataCebo

The SDGym library is a part of the Synthetic Data Vault Project, first created at MIT's Data to AI Lab in 2016. After 4 years of research and traction with enterprise, we created DataCebo in 2020 with the goal of growing the project.

Today, DataCebo is the proud developer of the SDV, the largest ecosystem for synthetic data generation & evaluation.