Links

Explore SDV

SDV is available in Public or Enterprise formats. Use this page to determine which one is right for your project needs.
Public SDV
Explore Synthetic Data. Train a generative AI with your own, simple datasets as a proof-of-concept. Create synthetic data that has the same patterns.
Publicly available with a Business Source License. Get started today!
SDV Enterprise
Ready for scale? Expand synthetic data solutions in your enterprise. Create generate AIs for more complex datasets.
Currently in Early Access. Contact Us to be part of our Beta program.

Features

AI-Based Synthesizers

These synthesizers use AI to learn patterns from your data and use them to recreate synthetic data.
Text
Public SDV
SDV Enterprise
GaussianCopula statistical AI
CTGAN, TVAE, CopulaGAN neural networks
PAR for sequential data
HMA multi-table for limited tables (<5)
HSA multi-table for unlimited tables
Independent multi-table for unlimited tables

Test Data Synthesizers

These synthesizers create random test data based on metadata alone. They do not use AI so you do not need to input any training data.
Text
Public SDV
SDV Enterprise
DayZSynthesizer single table
DayZSynthesizer multi table

Integrate You Data

These features make it easy to integrate the SDV into your application and pipeline.
Text
Public SDV
SDV Enterprise
Auto-detect metadata using data CSVs or DataFrames
Auto-detect metadata with a DDL file from an SQL schema

Pre-Process Statistical Information

Transformers are used to pre-process your data, which can improve data quality. SDV synthesizers select transformers by default, but you can always customize these to your dataset.
Text
Public SDV
SDV Enterprise
FloatFormatter for missing value imputation, numerical columns
ClusterBased and Gaussian Normalizers statistical transforms
Uniform, Label, and OneHot Encoding for discrete variables (nominal and ordinal)
Datetime Encoding including datetime format parsing
OutlierEncoder for numerical outliers

Understand & Anonymize Real-World Concepts

Transformers are used to pre-process your data, which can improve data quality. SDV synthesizers select transformers by default, but you can always customize these to your dataset.
These transformers are geared towards columns that correspond to industry or domain-specific concepts. Their structure may be human-created.
Text
Public SDV
SDV Enterprise
RegexGenerator, IDGenerator for keys and IDs
AnonymizedFaker general-purpose anonymization
PsuedoAnonymizedFaker for general pseudo-anonymization with a mapping
Emails understanding domains
Addresses understanding locations
Phone Numbers understanding country and area codes
[Coming soon!] GPS Coordinates understanding geographical areas and distances

Constraints

Constraints represent business rules and logic that you can apply to your synthesizer.
Text
Public SDV
SDV Enterprise
Predefined logic for individual columns: FixedIncrements, Negative, Positive, ScalarInequality, ScalarRange
Predefined logic for multiple columns: FixedCombinations, Inequality, OneHotEncoding, Range
Write your own custom constraints
Advanced, predefined logic: ChainedInequality
Support for custom constraints and additional predefined logic

Synthetic Data Evaluation

Evaluate your synthetic data by comparing it against the real data.
Text
Public SDV
SDV Enterprise
Access to SDMetrics library vendor-agnostic, open source
Diagnostic Report basic data validity checks , single and multi-table
Quality Report statistical similarity, single and multi-table
Visualization 1D and 2D bars, scatterplots, heatmaps and more
Use case-specific metrics: OutlierCoverage, SmoothnessSimilarity
Last modified 13d ago
Copyright (c) 2023, DataCebo, Inc.