Welcome to the SDV!
The Synthetic Data Vault (SDV) is a Python library designed to be your one-stop shop for creating tabular synthetic data. It is available to the public under the Business Source License.

The SDV offers multiple machine learning models -- ranging from classical statistical methods (Copulas) to deep learning methods (GANs).
Generate data for single tables, multiple connected tables or sequential tables.
Compare the synthetic data to the real data against a variety of measures. Diagnose problems and generate a quality report to get more insights.
You can control data processing to improve the quality of synthetic data. Choose from different types of anonymization and define business rules in the form of logical constraints.
Use the SDV package to get fully integrated solution and your one-stop shop for synthetic data. Or, use the standalone libraries for specific needs.
The SDV library is a part of the greater Synthetic Data Vault Project, first created at MIT's Data to AI Lab in 2016. After 4 years of research and traction with enterprise, we created DataCebo in 2020 with the goal of growing the project.
Today, DataCebo is the proud developer of the SDV, the largest ecosystem for synthetic data generation & evaluation.
Last modified 11d ago