# Sequential Metadata

Use this guide to write a description for a single data table that represents sequential data, for example, a timeseries. In sequential data, rows have a specific order. Your data table may contain multiple, independent sequences belonging to different entities. See the diagram below for an illustration of sequential data.

![This example shows sequential data related to vital signs. The table contains multiple sequences, each corresponding to a different patient. For each sequences, health measurements change over time.](/files/Nt7edtaQBQ3qHqJtCfJw)

Your data description is called metadata. SDMetrics expects metadata as a **Python dictionary** object.&#x20;

<details>

<summary>Click to see the sequential table's metadata</summary>

This is the metadata dictionary for the illustrated sequential table

```python
{
    "sequence_key": "Patient ID",
    "sequence_index": "Time",
    "columns": {
        "Patient ID": {
            "sdtype": "id",
            "regex_format": "ID_[0-9]{3}"
        },
        "Address": {
            "sdtype": "address",
            "pii": True
        },
        "Smoker": {
            "sdtype": "boolean"
        },
        "Time": {
            "sdtype": "datetime",
            "datetime_format": "%m/%d/%Y"
        },
        "Heart Rate": {
            "sdtype": "categorical"
        },
        "Systolic BP": {
            "sdtype": "numerical"
        }
    }
}
```

</details>

## Metadata Specification

The file is an object can have multiple keys:

* `"primary_key"`: the column name used to identify a row in your table
* `"sequence_key"`: the name of a column that identifies each unique sequence in your data
* `"sequence_index"`: the column name used to order the rows in the table
* (required) `"columns"`: a dictionary description of each column

```python
{
    "sequence_key": "Patient ID",
    "sequence_index": "Time",
    "columns": { <column information> }
}
```

### Column Information

Inside `"columns"`, you will describe each column. You'll start with the name of the column. Then you'll specify the type of data and any other information about it.

There are specific data types to choose from. Expand the options below to learn about the data types.

{% tabs %}
{% tab title="boolean" %}
Boolean columns represent True or False values.

```python
"active": { 
    "sdtype": "boolean"
}
```

**Properties** (None)
{% endtab %}

{% tab title="categorical" %}
Categorical columns describe discrete data.

```python
"tier": {
    "sdtype": "categorical",
}
```

**Properties** (None)
{% endtab %}

{% tab title="datetime" %}
Date columns represent a point in time

```python
"renew_date": {
    "sdtype": "datetime",
    "format": "%Y-%m-%d"
}
```

**Properties**

* (required) `datime_format`: A string describing the format as defined by [Python's strftime module](https://docs.python.org/3/library/datetime.html#strftime-and-strptime-format-codes).

{% hint style="info" %}
The format string has special values to describe the components. For example, `Jan 06, 2022` is represented as `"%b %d, %Y".` Common values are:

* **Year**: `"%Y"` for a 4-digit year like 2022, or `"%y"` for a 2-digit year like 22
* **Month**: `"%m"` for a 2-digit month like 01, `"%b"` for an abbreviated month like Jan
* **Day**: `"%d"` for a 2-digit day like 06
  {% endhint %}
  {% endtab %}

{% tab title="numerical" %}
Numerical columns represents discrete or continuous numerical values.&#x20;

```python
"age": {
    "sdtype": "numerical"
},
"paid_amt": {
    "sdtype": "numerical",
    "compute_representation": "Float"
}
```

**Properties**

* `computer_representation`: A string that represents how you'll ultimately store the data. This determines the min and max values allowed\
  Available options are: `'Float'`, `'Int8'`, `'Int16'`, `'Int32'`, `'Int64'`, `'UInt8'`, `'UInt16'`, `'UInt32'`, `'UInt64'`
  {% endtab %}

{% tab title="id" %}
ID columns represent identifiers that do not have any special mathematical or semantic meaning

```python
"user_id": { 
    "sdtype": "id",
    "regex_format": "U_[0-9]{3}"
}
```

**Properties**

* `regex_format`: A string describing the format of the ID as a [regular expression](https://docs.python.org/3/library/re.html)
  {% endtab %}

{% tab title="other" %}
You can input any other data type such as `'phone_number'`, `'ssn'` or `'email'`. See the [Sdtypes Reference](https://docs.sdv.dev/sdv/reference/metadata-spec/sdtypes) for a full list.

```python
"address": {
    "sdtype": "address",
    "pii": True
}
```

**Properties**

* `pii`: A boolean denoting whether the data is sensitive
  * (default) `True`: The column is sensitive, meaning the synthetic data is anonymized&#x20;
  * `False`: The column is not sensitive, meaning the synthetic data may not be anonymized
    {% endtab %}
    {% endtabs %}

## Saving & Loading Metadata

After creating your dictionary, you can save it as a JSON file. For example, `my_metadata_file.json`.

```python
import json

with open('my_metadata_file.json', 'w') as f:
    json.dump(my_metadata_dict, f)
```

In the future, you can load the Python dictionary by reading from the file.

```python
import json 

with open('my_metadata_file.json') as f:
    my_metadata_dict = json.load(f)

# use my_metadata_dict in the SDMetrics library
```


---

# Agent Instructions: Querying This Documentation

If you need additional information that is not directly available in this page, you can query the documentation dynamically by asking a question.

Perform an HTTP GET request on the current page URL with the `ask` query parameter:

```
GET https://docs.sdv.dev/sdmetrics/getting-started/metadata/sequential-metadata.md?ask=<question>
```

The question should be specific, self-contained, and written in natural language.
The response will contain a direct answer to the question and relevant excerpts and sources from the documentation.

Use this mechanism when the answer is not explicitly present in the current page, you need clarification or additional context, or you want to retrieve related documentation sections.
