Welcome to the SDV!

The Synthetic Data Vault (SDV) is a Python library designed to be your one-stop shop for creating tabular synthetic data. It is available to the public under the Business Source License.

Key Features

🧠 Create synthetic data using machine learning

The SDV offers multiple machine learning models -- ranging from classical statistical methods (Copulas) to deep learning methods (GANs).
Generate data for single tables, multiple connected tables or sequential tables.

📊 Evaluate and visualize synthetic data

Compare the synthetic data to the real data against a variety of measures. Diagnose problems and generate a quality report to get more insights.

🔄 Preprocess, anonymize and define constraints

You can control data processing to improve the quality of synthetic data. Choose from different types of anonymization and define business rules in the form of logical constraints.

Explore the SDV Ecosystem

The SDV ecosystem is home to multiple libraries that support synthetic data.
Use the SDV package to get fully integrated solution and your one-stop shop for synthetic data. Or, use the standalone libraries for specific needs.

Owned & Maintained by DataCebo

The SDV library is a part of the greater Synthetic Data Vault Project, first created at MIT's Data to AI Lab in 2016. After 4 years of research and traction with enterprise, we created DataCebo in 2020 with the goal of growing the project.
Today, DataCebo is the proud developer of the SDV, the largest ecosystem for synthetic data generation & evaluation.
Last modified 11d ago
Copyright (c) 2023, DataCebo, Inc.