Data Preparation
Sequential data represents ordered records, such as in a timeseries. The entire table may contain records for a single entity (such as a user or patient). Alternatively, your table may also contain multiple, independent sequences belonging to different entities.
.png?alt=media&token=91e6e11c-5b1b-425d-84f9-0b404980772a)
This example shows sequential data related to vital signs. The table contains multiple sequences, each corresponding to a different patient. For each sequences, health measurements change over time.
Before you begin creating synthetic data, it's important to have your data ready in the right format:
- 1.Data, a dictionary that maps every table name to a pandas DataFrame object containing the actual data
- 2.Metadata, a SingleTableMetadata object that describes your table. It includes the data types in each column, keys and other identifiers.
{
'METADATA_SPEC_VERSION': 'SINGLE_TABLE_V1',
'sequence_key': 'Patient ID',
'sequence_index': 'Time',
'columns': {
'Patient ID': { 'sdtype': 'text', 'regex_format': 'ID_[0-9]{3}' },
'Address': { 'sdtype': 'address', 'pii': True },
'Smoker': { 'sdtype': 'boolean' },
'Time': { 'sdtype': 'datetime', 'datetime_format': '%m/%d/%Y' },
'Heart Rate': { 'sdtype': 'categorical' },
'Systolic BP': { 'sdtype': 'numerical' }
}
}
Last modified 1mo ago