Welcome to the SDV!

The Synthetic Data Vault (SDV) is a Python library designed to be your one-stop shop for creating tabular synthetic data.

Key Features

🧠 Train your own generative AI model. Choose from a variety of AI algorithms designed for tabular data — single table, sequential, or multi-table (relational) data. Train your own synthesizer using your real data, and create any amount of synthetic data on-demand. SDV is designed to work on-prem, with standard CPUs.

📊 Evaluate & visualize synthetic data. Measure the statistical quality of your synthetic data and diagnose problems. For even more insight, create visualizations that compare your synthetic data with your real data.

⚙️ Customize your synthesizer. The SDV platform offers powerful features for creating higher quality synthetic data. You can add constraints, adjust the data preprocessing, and selecting anonymization options for any SDV synthesizer.

Get started with SDV Community

Get started with the publicly available SDV Community, distributed under the Business Source License.

pip install sdv

SDV Community is great for exploring the benefits of synthetic data. Train a generative AI with your own, simple datasets as a proof-of-concept. Create synthetic data that has the same patterns.

import pandas as pd
from sdv.single_table import GaussianCopulaSynthesizer
from sdv.metadata import Metadata

data = pd.read_csv('my_data_file.csv')
metadata = Metadata.detect_from_dataframe(data)

synthesizer = GaussianCopulaSynthesizer(metadata)
synthesizer.fit(data)
synthetic_data = synthesizer.sample(num_rows=1000)

Get started now! Check out the SDV Community installation guide and tutorials.

Take synthetic data to the next level with SDV Enterprise

SDV Enterprise is available to licensed users. With SDV Enterprise, you'll have access to everything in SDV Community plus the ability to ...

✅ Create synthetic data for large numbers of complex, interconnected data tables using scalable synthesizers

✅ Improve the quality of your synthetic data with more advanced data preprocessing, deeper data understanding, and enhanced AI algorithms

✅ Easily integrate data sources and deploy synthetic data applications enterprise-wide

To learn more, visit the SDV Enterprise page.

Owned & Maintained by DataCebo

The SDV library is a part of the greater Synthetic Data Vault Project, first created at MIT's Data to AI Lab in 2016. After 4 years of research and traction with enterprise, we created DataCebo in 2020 with the goal of growing the project.

Today, DataCebo is the proud developer of the SDV, the largest ecosystem for synthetic data generation & evaluation.

NextTutorials

Last updated 2 months ago