Data Preparation

Single table data contains rows and columns of information. Each row typically represents a new entity such as a user, transaction, or session.

This example of a single table includes a new row for each guest of a hotel.

Before you begin creating synthetic data, it's important to have your data ready in the right format:

  1. Data, loaded into Python as a pandas DataFrame object, and

  2. Metadata, a SingleTableMetadata object that describes your table. It includes the data types in each column, primary keys and other identifiers.

Click to see the table's metadata
{
    "primary_key": "guest_email",
    "alternate_keys": [ "credit_card_number" ],
    "METADATA_SPEC_VERSION": "SINGLE_TABLE_V1",
    "columns": {
        "guest_email": { "sdtype": "email", "pii": true },
        "has_rewards": { "sdtype": "boolean" },
        "room_type": { "sdtype": "categorical" },
        "amenities_fee": { "sdtype": "numerical" },
        "checkin_date": { "sdtype": "datetime", "datetime_format": "%d %b %Y" },
        "checkout_date": { "sdtype": "datetime", "datetime_format": "%d %b %Y" },
        "room_rate": { "sdtype": "numerical" },
        "billing_address": { "sdtype": "address", "pii": true },
        "credit_card_number": { "sdtype": "credit_card_number", "pii": true }
    }
}

Learn More

FAQs

Can there be an order between the rows?

For a true, single table data usage, the rows should be independent -- i.e. there should be no ordering or dependencies between the rows of your table.

If you do have a specific order, your data is likely sequential. You can still write a single table metadata but some additional details. See the Sequential Data section for more information.

Last updated

#190: add_column() to both SingleTableMetadata and MultiTableMetadata

Change request updated