# ＊ OutlierCoverage

{% hint style="info" %}
**＊SDV Enterprise Feature.** This feature is available to our licensed users and is not currently in our public library. To learn more about the SDV Enterprise and its extra features,[ visit our website](https://datacebo.com/pricing/).
{% endhint %}

This metric measures whether the synthetic data contains outliers that were present in the real data. It evaluates a failure mode where the synthetic data does not contain any outliers.

## Data Compatibility

* **Numerical** : This metric is meant for numerical data&#x20;
* **Datetime** : This metric converts datetime values into numerical values

This metric ignores missing values.&#x20;

{% hint style="info" %}
**This metric is designed for data that contains outliers.** We assume that the real data contains outliers, or else the metric is undefined.&#x20;
{% endhint %}

## Score

**(best) 1.0**: The synthetic data fully covers the outlier regions that are in the real data

**(worst) 0.0**: The synthetic data does not contain any outliers

## How does it work?

This metric first finds outliers in the real data (R) using the interquartile range (IQR) \[1]. Any data that is 1.5× lower than Q1 is considered a *left outlier* and any data that is 1.5× higher than Q3 is considered a *right outlier.*

<figure><img src="/files/DjdX8mNtHoS9RxhdEXjr" alt=""><figcaption><p>In this example, we're computing the IQR for a distribution of real data (black). The quartiles are shown in a box plot underneath the distribution. Areas that are &#x3C;1.5×IQR and >1.4×IQR are considered outliers, as shown in the red boxes.</p></figcaption></figure>

The metric uses the computed IQR to find outliers in the synthetic data (S). It then compares the proportion of data points in the outlier ranges between the real data (R) and synthetic data (S) to return a final score.

$$
score = \text{min}\left(\frac{p\_S}{p\_R}, 1\right)\text{, where }p = \frac{\text{# outlier points}}{\text{total # data points}}
$$

## Usage

To apply this metric, access the `single_column` module and use the `compute` method.

```python
from sdmetrics.single_column import OutlierCoverage

OutlierCoverage.compute(
    real_data=real_table['column_name'],
    synthetic_data=synthetic_table['column_name']
)
```

**Parameters**

* (required) `real_data`: A pandas.Series object with the column of real data
* (required) `synthetic_data`: A pandas.Series object with the column of synthetic data

## FAQs

**Technical Note: What is captured by this metric?**

The OutlierCoverage score describes whether the synthetic data generally has data points in the outlier regions. But it does not tell us anything about the shape of the synthetic data. In the example below, the OutlierCoverage score is 1.0 because the synthetic data has plenty of data points in the outlier regions (red). However, the synthetic data is not the same shape as the real data.

<figure><img src="/files/qmFVdXsZXVJOtGgjQ0AL" alt=""><figcaption></figcaption></figure>

In this case, the synthetic data is smoother than the real data, which is why there are many data points in the outlier regions. This can be beneficial for certain uses. To quantify this pattern, see the [SmoothnessSimilarity](/sdmetrics/data-metrics/metrics-in-beta/smoothnesssimilarity.md) metric.

### References

\[1] [Interquartile Range, Outliers](/sdmetrics/data-metrics/quality/quality-report/whats-included.md)


---

# Agent Instructions: Querying This Documentation

If you need additional information that is not directly available in this page, you can query the documentation dynamically by asking a question.

Perform an HTTP GET request on the current page URL with the `ask` query parameter:

```
GET https://docs.sdv.dev/sdmetrics/data-metrics/metrics-in-beta/outliercoverage.md?ask=<question>
```

The question should be specific, self-contained, and written in natural language.
The response will contain a direct answer to the question and relevant excerpts and sources from the documentation.

Use this mechanism when the answer is not explicitly present in the current page, you need clarification or additional context, or you want to retrieve related documentation sections.
