Synthetic Data Vault
GitHubSlackDataCebo
  • Welcome to the SDV!
  • Tutorials
  • Explore SDV
    • SDV Community
    • SDV Enterprise
      • ⭐Compare Features
    • SDV Bundles
      • ❖ AI Connectors
      • ❖ CAG
      • ❖ Differential Privacy
      • ❖ XSynthesizers
  • Single Table Data
    • Data Preparation
      • Loading Data
      • Creating Metadata
    • Modeling
      • Synthesizers
        • GaussianCopulaSynthesizer
        • CTGANSynthesizer
        • TVAESynthesizer
        • ❖ XGCSynthesizer
        • ❖ SegmentSynthesizer
        • * DayZSynthesizer
        • ❖ DPGCSynthesizer
        • ❖ DPGCFlexSynthesizer
        • CopulaGANSynthesizer
      • Customizations
        • Constraints
        • Preprocessing
    • Sampling
      • Sample Realistic Data
      • Conditional Sampling
    • Evaluation
      • Diagnostic
      • Data Quality
      • Visualization
  • Multi Table Data
    • Data Preparation
      • Loading Data
        • Demo Data
        • CSV
        • Excel
        • ❖ AlloyDB
        • ❖ BigQuery
        • ❖ MSSQL
        • ❖ Oracle
        • ❖ Spanner
      • Cleaning Your Data
      • Creating Metadata
    • Modeling
      • Synthesizers
        • * DayZSynthesizer
        • * IndependentSynthesizer
        • HMASynthesizer
        • * HSASynthesizer
      • Customizations
        • Constraints
        • Preprocessing
      • * Performance Estimates
    • Sampling
    • Evaluation
      • Diagnostic
      • Data Quality
      • Visualization
  • Sequential Data
    • Data Preparation
      • Loading Data
      • Cleaning Your Data
      • Creating Metadata
    • Modeling
      • PARSynthesizer
      • Customizations
    • Sampling
      • Sample Realistic Data
      • Conditional Sampling
    • Evaluation
  • Concepts
    • Metadata
      • Sdtypes
      • Metadata API
      • Metadata JSON
    • Constraints
      • Predefined Constraints
        • Positive
        • Negative
        • ScalarInequality
        • ScalarRange
        • FixedIncrements
        • FixedCombinations
        • ❖ FixedNullCombinations
        • ❖ MixedScales
        • OneHotEncoding
        • Inequality
        • Range
        • * ChainedInequality
      • Custom Logic
        • Example: IfTrueThenZero
      • ❖ Constraint Augmented Generation (CAG)
        • ❖ CarryOverColumns
        • ❖ CompositeKey
        • ❖ ForeignToForeignKey
        • ❖ ForeignToPrimaryKeySubset
        • ❖ PrimaryToPrimaryKey
        • ❖ PrimaryToPrimaryKeySubset
        • ❖ SelfReferentialHierarchy
        • ❖ ReferenceTable
        • ❖ UniqueBridgeTable
  • Support
    • Troubleshooting
      • Help with Installation
      • Help with SDV
    • Versioning & Backwards Compatibility Policy
Powered by GitBook

Copyright (c) 2023, DataCebo, Inc.

On this page
  • Local Data
  • ❖ Connect to a database (AI Connectors)
  • Do you have data in other formats?
  1. Multi Table Data
  2. Data Preparation

Loading Data

PreviousData PreparationNextDemo Data

Last updated 18 days ago

Load your data into Python to use it for SDV modeling. SDV supports many different types of data formats for import and export.

Don't have any data yet? The SDV library contains many different demo datasets that you can use to get started. To learn more, see the page.

Local Data

If your data is already available as local files (on your own machine), load them into SDV using the functions below.

❖ Connect to a database (AI Connectors)

If your data is available in a database, use our AI Connectors feature to directly import some data for SDV. Later you can use the same connector to export synthetic data into a new database.

Do you have data in other formats?

The SDV uses the for data manipulation and synthesizing. If your data is in any other format, load it in as a object to use in the SDV. For multi table data, make sure you format your data as a dictionary, mapping each table name to a different DataFrame object.

multi_table_data = {
    'table_name_1': <pandas.DataFrame>,
    'table_name_2': <pandas.DataFrame>,
    ...
}
import pandas as pd

data_table_1 = pd.read_json('file://localhost/path/to/table_1.json')
data_table_2 = pd.read_json('file://localhost/path/to/table_2.json')

Pandas offers many methods to load in different types of data. For example: or .

For more options, see the .

SQL table
JSON string
pandas reference
SDV Demo Data

Load multiple CSV files into Python.

Load an entire Excel spreadsheet into Python.

pandas library
pandas.DataFrame

CSV Data
Excel Spreadsheet

❖ SDV Enterprise Bundle. This feature is available as part of the AI Connectors Bundle, an optional add-on to SDV Enterprise. For more information, please visit the page.

AI Connectors Bundle
Cover

❖

Cover

❖

Cover

❖

Cover

❖

Cover

❖

AlloyDB
BigQuery
MSSQL
Oracle
Spanner