Data Preparation

Single table data contains rows and columns of information. Each row typically represents a new entity such as a user, transaction, or session.

Before you begin creating synthetic data, it's important to have your data ready in the right format:

  1. Data, loaded into Python as a pandas DataFrame object, and

  2. Metadata, a SingleTableMetadata object that describes your table. It includes the data types in each column, primary keys and other identifiers.

Click to see the table's metadata
    "primary_key": "guest_email",
    "alternate_keys": [ "credit_card_number" ],
    "columns": {
        "guest_email": { "sdtype": "email", "pii": true },
        "has_rewards": { "sdtype": "boolean" },
        "room_type": { "sdtype": "categorical" },
        "amenities_fee": { "sdtype": "numerical" },
        "checkin_date": { "sdtype": "datetime", "datetime_format": "%d %b %Y" },
        "checkout_date": { "sdtype": "datetime", "datetime_format": "%d %b %Y" },
        "room_rate": { "sdtype": "numerical" },
        "billing_address": { "sdtype": "address", "pii": true },
        "credit_card_number": { "sdtype": "credit_card_number", "pii": true }

Learn More

Get started with a demo dataset or load your own data.

Create an object to describe the different columns in your data. Save it for future use.


Can there be an order between the rows?

For a true, single table data usage, the rows should be independent -- i.e. there should be no ordering or dependencies between the rows of your table.

If you do have a specific order, your data is likely sequential. You can still write a single table metadata but some additional details. See the Sequential Data section for more information.

Last updated

Copyright (c) 2023, DataCebo, Inc.