# MissingValueSimilarity

This metric compares whether the synthetic data has the same proportion of missing values as the real data for a given column.

## Data Compatibility

* **All data**: Any data is compatible with this metric as long as it contains missing values

## Score

**(best) 1.0**: The synthetic data perfectly captures the proportion of missing values

**(worst) 0.0**: The synthetic data has a completely different proportion of missing values than the real data

## How does it work?

This test computes the proportion of missing values, *p*, in both the real and synthetic data, *R* and *S.* It normalizes them and returns a similarity score in the range \[0, 1], with 1 representing the highest similarity.&#x20;

$$
score = 1 - |S\_p - R\_p|
$$

Note that the term at the right is equivalent to the Total Variation Distance \[1] of the missing/non-missing values between the real and synthetic data

## Usage

Access this metric from the `single_column` module and use the `compute` method.

```python
from sdmetrics.single_column import MissingValueSimilarity

MissingValueSimilarity.compute(
    real_data=real_table['column_name'],
    synthetic_data=synthetic_table['column_name']
)
```

**Parameters**

* (required) `real_data`: A pandas.Series containing a single column with missing values
* (required) `synthetic_data`: A pandas.Series object with the synthetic version of the column

## FAQs

<details>

<summary>What kind of values count as missing?</summary>

We use the same convention as pandas for determining when a value is missing \[2]. Missing values in your data should be represented as `NaN` objects.

If you are using any special notation to denote missing values, convert them to `NaN` values before using this metric.

</details>

## References

\[1] <https://en.wikipedia.org/wiki/Total_variation_distance_of_probability_measures>

\[2] <https://pandas.pydata.org/docs/user_guide/missing_data.html>


---

# Agent Instructions: Querying This Documentation

If you need additional information that is not directly available in this page, you can query the documentation dynamically by asking a question.

Perform an HTTP GET request on the current page URL with the `ask` query parameter:

```
GET https://docs.sdv.dev/sdmetrics/data-metrics/quality/missingvaluesimilarity.md?ask=<question>
```

The question should be specific, self-contained, and written in natural language.
The response will contain a direct answer to the question and relevant excerpts and sources from the documentation.

Use this mechanism when the answer is not explicitly present in the current page, you need clarification or additional context, or you want to retrieve related documentation sections.
