Data Preparation

Sequential data represents ordered records, such as in a timeseries. The entire table may contain records for a single entity (such as a user or patient). Alternatively, your table may also contain multiple, independent sequences belonging to different entities.

Before you begin creating synthetic data, it's important to have your data ready in the right format:

  1. Data, a dictionary that maps every table name to a pandas DataFrame object containing the actual data

  2. Metadata, a Metadata object that describes your table. It includes the data types in each column, keys and other identifiers.

Click to see the table's metadata
{
    "METADATA_SPEC_VERSION": "SINGLE_TABLE_V1",
    "sequence_key": "Patient ID",
    "sequence_index": "Time",
    "columns": {
        "Patient ID": { "sdtype": "id", "regex_format": "ID_[0-9]{3}" },
        "Address": { "sdtype": "address", "pii": True },
        "Smoker": { "sdtype": "boolean" },
        "Time": { "sdtype": "datetime", "datetime_format": "%m/%d/%Y" },
        "Heart Rate": { "sdtype": "categorical" },
        "Systolic BP": { "sdtype": "numerical" }
    }
}

Learn More

Last updated

Copyright (c) 2023, DataCebo, Inc.