Use this guide to write a description for a single data table that represents sequential data, for example, a timeseries. In sequential data, rows have a specific order. Your data table may contain multiple, independent sequences belonging to different entities. See the diagram below for an illustration of sequential data.
Your data description is called metadata. SDMetrics expects metadata as a Python dictionary object.
Click to see the sequential table's metadata
This is the metadata dictionary for the illustrated sequential table
Inside "columns", you will describe each column. You'll start with the name of the column. Then you'll specify the type of data and any other information about it.
There are specific data types to choose from. Expand the options below to learn about the data types.
computer_representation: A string that represents how you'll ultimately store the data. This determines the min and max values allowed
Available options are: 'Float', 'Int8', 'Int16', 'Int32', 'Int64', 'UInt8', 'UInt16', 'UInt32', 'UInt64'
ID columns represent identifiers that do not have any special mathematical or semantic meaning
regex_format: A string describing the format of the ID as a regular expression
You can input any other data type such as 'phone_number', 'ssn' or 'email'. See the Sdtypes Reference for a full list.
"address": {
"sdtype": "address",
"pii": True
}
Properties
pii: A boolean denoting whether the data is sensitive
(default) True: The column is sensitive, meaning the synthetic data is anonymized
False: The column is not sensitive, meaning the synthetic data may not be anonymized
Saving & Loading Metadata
After creating your dictionary, you can save it as a JSON file. For example, my_metadata_file.json.
import json
with open('my_metadata_file.json', 'w') as f:
json.dump(my_metadata_dict, f)
In the future, you can load the Python dictionary by reading from the file.
import json
with open('my_metadata_file.json') as f:
my_metadata_dict = json.load(f)
# use my_metadata_dict in the SDMetrics library
This example shows sequential data related to vital signs. The table contains multiple sequences, each corresponding to a different patient. For each sequences, health measurements change over time.