# SequenceLengthSimilarity

This metric is for sequential data. It measures the similarity between a real and synthetic column in terms of the length of sequences that they represent.&#x20;

## Data Compatibility

* **ID** : This metric is meant for a column that represents sequence IDs. The IDs are used to distinguish between different sequences.

This metric ignores missing values.

## Score

**(best) 1.0**: The sequence lengths in the synthetic data are exactly the same as the real data

**(worst) 0.0**: The sequence lengths in the synthetic data are as different as can be from the real data

## How does it work?

This metric assumes you have an ID column to represent sequences. For example, if you are storing different sequences of patient health information, the `Patient ID` column represents the sequence ID. The length of a sequence is determined by how often an ID value repeats.&#x20;

<figure><img src="/files/pYeKrA4DU4GkRfBnngcE" alt=""><figcaption></figcaption></figure>

This metric first computes the length of each sequence in the real data. Since you may have multiple sequences, this will form a distribution of real data, D\_r. The metric will then compute the same for the synthetic data, forming a different distribution, D\_s.

This metric will then compare the two distributions using the [KSComplement](/sdmetrics/data-metrics/quality/kscomplement.md) metric.&#x20;

$$
score = KSComplement(D\_r, D\_s)
$$

## Usage

Access this metric from the `single_column` module and use the `compute` method.

```python
from sdmetrics.single_column import SequenceLengthSimilarity

SequenceLengthSimilarity.compute(
    real_data=real_table['id_column'],
    synthetic_data=synthetic_table['id_column']
)
```

**Parameters**

* (required) `real_data`: A pandas.Series object with the column of real data
* (required) `synthetic_data`: A pandas.Series object with the column of synthetic data&#x20;

## FAQs

<details>

<summary>Do the ID values have to match up between the real and synthetic data?</summary>

No, the ID values are not expected to be the same between the real and synthetic data because they represent entirely different entities. This metric is computing the lengths of the sequences.

</details>


---

# Agent Instructions: Querying This Documentation

If you need additional information that is not directly available in this page, you can query the documentation dynamically by asking a question.

Perform an HTTP GET request on the current page URL with the `ask` query parameter:

```
GET https://docs.sdv.dev/sdmetrics/data-metrics/quality/sequencelengthsimilarity.md?ask=<question>
```

The question should be specific, self-contained, and written in natural language.
The response will contain a direct answer to the question and relevant excerpts and sources from the documentation.

Use this mechanism when the answer is not explicitly present in the current page, you need clarification or additional context, or you want to retrieve related documentation sections.
