Data Preparation

Single table data contains rows and columns of information. Each row typically represents a new entity such as a user, transaction, or session.

This example of a single table includes a new row for each guest of a hotel.

Before you begin creating synthetic data, it's important to have your data ready in the right format:

  1. Data, loaded into Python as a pandas DataFrame object, and

  2. Metadata, a Metadata object that describes your table. It includes the data types in each column, primary keys and other identifiers.

Click to see the table's metadata
{
    "METADATA_SPEC_VERSION": "V1",
    "tables": {
        "hotel_guests": {
            "primary_key": "guest_email",
            "alternate_keys": [ "credit_card_number" ],
            "columns": {
                "guest_email": { "sdtype": "email", "pii": true },
                "has_rewards": { "sdtype": "boolean" },
                "room_type": { "sdtype": "categorical" },
                "amenities_fee": { "sdtype": "numerical" },
                "checkin_date": { "sdtype": "datetime", "datetime_format": "%d %b %Y" },
                "checkout_date": { "sdtype": "datetime", "datetime_format": "%d %b %Y" },
                "room_rate": { "sdtype": "numerical" },
                "billing_address": { "sdtype": "address", "pii": true },
                "credit_card_number": { "sdtype": "credit_card_number", "pii": true }
            }
        }
    }
}

Learn More

FAQs

Can there be an order between the rows?

For a true, single table data usage, the rows should be independent -- i.e. there should be no ordering or dependencies between the rows of your table.

If you do have a specific order, your data is likely sequential. You can still write a single table metadata but some additional details. See the Sequential Data section for more information.

Last updated