Data Preparation
Sequential data represents ordered records, such as in a timeseries. The entire table may contain records for a single entity (such as a user or patient). Alternatively, your table may also contain multiple, independent sequences belonging to different entities.
.png?alt=media&token=91e6e11c-5b1b-425d-84f9-0b404980772a)
This example shows sequential data related to vital signs. The table contains multiple sequences, each corresponding to a different patient. For each sequences, health measurements change over time.
Before you begin creating synthetic data, it's important to have your data ready in the right format:
- 1.Data, a dictionary that maps every table name to a pandas DataFrame object containing the actual data
- 2.Metadata, a SingleTableMetadata object that describes your table. It includes the data types in each column, keys and other identifiers.
{
"METADATA_SPEC_VERSION": "SINGLE_TABLE_V1",
"sequence_key": "Patient ID",
"sequence_index": "Time",
"columns": {
"Patient ID": { "sdtype": "id", "regex_format": "ID_[0-9]{3}" },
"Address": { "sdtype": "address", "pii": True },
"Smoker": { "sdtype": "boolean" },
"Time": { "sdtype": "datetime", "datetime_format": "%m/%d/%Y" },
"Heart Rate": { "sdtype": "categorical" },
"Systolic BP": { "sdtype": "numerical" }
}
}
Last modified 6mo ago