Synthetic Data Vault
GitHubSlackDataCebo
  • Welcome to the SDV!
  • Tutorials
  • Explore SDV
    • SDV Community
    • SDV Enterprise
      • ⭐Compare Features
    • SDV Bundles
      • ❖ AI Connectors
      • ❖ CAG
      • ❖ Differential Privacy
      • ❖ XSynthesizers
  • Single Table Data
    • Data Preparation
      • Loading Data
      • Creating Metadata
    • Modeling
      • Synthesizers
        • GaussianCopulaSynthesizer
        • CTGANSynthesizer
        • TVAESynthesizer
        • ❖ XGCSynthesizer
        • ❖ SegmentSynthesizer
        • * DayZSynthesizer
        • ❖ DPGCSynthesizer
        • ❖ DPGCFlexSynthesizer
        • CopulaGANSynthesizer
      • Customizations
        • Constraints
        • Preprocessing
    • Sampling
      • Sample Realistic Data
      • Conditional Sampling
    • Evaluation
      • Diagnostic
      • Data Quality
      • Visualization
  • Multi Table Data
    • Data Preparation
      • Loading Data
        • Demo Data
        • CSV
        • Excel
        • ❖ AlloyDB
        • ❖ BigQuery
        • ❖ MSSQL
        • ❖ Oracle
        • ❖ Spanner
      • Cleaning Your Data
      • Creating Metadata
    • Modeling
      • Synthesizers
        • * DayZSynthesizer
        • * IndependentSynthesizer
        • HMASynthesizer
        • * HSASynthesizer
      • Customizations
        • Constraints
        • Preprocessing
      • * Performance Estimates
    • Sampling
    • Evaluation
      • Diagnostic
      • Data Quality
      • Visualization
  • Sequential Data
    • Data Preparation
      • Loading Data
      • Cleaning Your Data
      • Creating Metadata
    • Modeling
      • PARSynthesizer
      • Customizations
    • Sampling
      • Sample Realistic Data
      • Conditional Sampling
    • Evaluation
  • Concepts
    • Metadata
      • Sdtypes
      • Metadata API
      • Metadata JSON
    • Constraints
      • Predefined Constraints
        • Positive
        • Negative
        • ScalarInequality
        • ScalarRange
        • FixedIncrements
        • FixedCombinations
        • ❖ FixedNullCombinations
        • ❖ MixedScales
        • OneHotEncoding
        • Inequality
        • Range
        • * ChainedInequality
      • Custom Logic
        • Example: IfTrueThenZero
      • ❖ Constraint Augmented Generation (CAG)
        • ❖ CarryOverColumns
        • ❖ CompositeKey
        • ❖ ForeignToForeignKey
        • ❖ ForeignToPrimaryKeySubset
        • ❖ PrimaryToPrimaryKey
        • ❖ PrimaryToPrimaryKeySubset
        • ❖ SelfReferentialHierarchy
        • ❖ ReferenceTable
        • ❖ UniqueBridgeTable
  • Support
    • Troubleshooting
      • Help with Installation
      • Help with SDV
    • Versioning & Backwards Compatibility Policy
Powered by GitBook

Copyright (c) 2023, DataCebo, Inc.

On this page
  • Learn More
  • FAQs
  1. Single Table Data

Data Preparation

Previous❖ XSynthesizersNextLoading Data

Last updated 7 months ago

Single table data contains rows and columns of information. Each row typically represents a new entity such as a user, transaction, or session.

Before you begin creating synthetic data, it's important to have your data ready in the right format:

  1. Data, loaded into Python as a object, and

  2. Metadata, a object that describes your table. It includes the data types in each column, primary keys and other identifiers.

Click to see the table's metadata
{
    "METADATA_SPEC_VERSION": "V1",
    "tables": {
        "hotel_guests": {
            "primary_key": "guest_email",
            "alternate_keys": [ "credit_card_number" ],
            "columns": {
                "guest_email": { "sdtype": "email", "pii": true },
                "has_rewards": { "sdtype": "boolean" },
                "room_type": { "sdtype": "categorical" },
                "amenities_fee": { "sdtype": "numerical" },
                "checkin_date": { "sdtype": "datetime", "datetime_format": "%d %b %Y" },
                "checkout_date": { "sdtype": "datetime", "datetime_format": "%d %b %Y" },
                "room_rate": { "sdtype": "numerical" },
                "billing_address": { "sdtype": "address", "pii": true },
                "credit_card_number": { "sdtype": "credit_card_number", "pii": true }
            }
        }
    }
}

Learn More

FAQs

Can there be an order between the rows?

For a true, single table data usage, the rows should be independent -- i.e. there should be no ordering or dependencies between the rows of your table.

If you do have a specific order, your data is likely sequential. You can still write a single table metadata but some additional details. See the section for more information.

Sequential Data

Get started with a demo dataset or load your own data.

Create an object to describe the different columns in your data. Save it for future use.

pandas DataFrame
Metadata

Loading Data
Creating Metadata
This example of a single table includes a new row for each guest of a hotel.