Data Preparation

Sequential data represents ordered records, such as in a timeseries. The entire table may contain records for a single entity (such as a user or patient). Alternatively, your table may also contain multiple, independent sequences belonging to different entities.

Before you begin creating synthetic data, it's important to have your data ready in the right format:

  1. Data, a dictionary that maps every table name to a pandas DataFrame object containing the actual data

  2. Metadata, a SingleTableMetadata object that describes your table. It includes the data types in each column, keys and other identifiers.

Click to see the table's metadata
    "sequence_key": "Patient ID",
    "sequence_index": "Time",
    "columns": {
        "Patient ID": { "sdtype": "id", "regex_format": "ID_[0-9]{3}" },
        "Address": { "sdtype": "address", "pii": True },
        "Smoker": { "sdtype": "boolean" },
        "Time": { "sdtype": "datetime", "datetime_format": "%m/%d/%Y" },
        "Heart Rate": { "sdtype": "categorical" },
        "Systolic BP": { "sdtype": "numerical" }

Learn More

Get started with a demo dataset or load your own data.

Create an object to describe the different columns in your data. Save it for future use.

Last updated

Copyright (c) 2023, DataCebo, Inc.