# Quality Report

The SDMetrics Quality Report captures the statistical similarity between real data and synthetic data. If the synthetic and real data are statistically similar, we refer to the synthetic data as being ***high quality*** (aka high fidelity). The report runs select metrics to measure these properties and summarizes the results.

```python
from sdmetrics.reports.single_table import QualityReport

report = QualityReport()
report.generate(real_data, synthetic_data, metadata)
```

```
Generating report ...

(1/2) Evaluating Column Shapes: |██████████| 9/9 [00:00<00:00, 273.13it/s]|
Column Shapes Score: 89.11%

(2/2) Evaluating Column Pair Trends: |██████████| 36/36 [00:00<00:00, 57.42it/s]|
Column Pair Trends Score: 88.3%

Overall Score (Average): 88.7%
```

Once you have generated the report you can get more details to explain the results, visualize the scores and save the report to share it.

## How does it work?

The quality report captures the **Column Shapes**, **Column Pair Trends** and **Cardinality**. This guide contains some technical details about each property.

### Column Shapes

Does the synthetic data capture the shape of each column?

The *shape* of a column describes its overall distribution. The higher the score, the more similar the distributions of real and synthetic data.

<figure><img src="https://2284413265-files.gitbook.io/~/files/v0/b/gitbook-x-prod.appspot.com/o/spaces%2FrNLha4DaPNwVJ930KhmB%2Fuploads%2FSqv1M4pqnxbfGdlj1aWp%2FScreen%20Shot%202022-07-20%20at%206.46.46%20PM.png?alt=media&#x26;token=09ebc093-9926-43c5-a402-3af691b952e6" alt=""><figcaption></figcaption></figure>

#### Methodology

This property applies metrics based on the column types.

<table><thead><tr><th width="163">Column Type</th><th width="194">Metric</th></tr></thead><tbody><tr><td>numerical</td><td><a href="kscomplement">KSComplement</a></td></tr><tr><td>datetime</td><td><a href="kscomplement">KSComplement</a></td></tr><tr><td>boolean</td><td><a href="tvcomplement">TVComplement</a></td></tr><tr><td>categorical</td><td><a href="tvcomplement">TVComplement</a></td></tr></tbody></table>

This yields a separate score for every column. The final **Column Shapes** score is the average of all columns.

{% hint style="info" %}
You may notice that column shape quality is better for discrete columns (categorical, boolean) as opposed to continuous columns (numerical, datetime). Generally, we've found that it's much easier to create synthetic data for a small number of known categories than large ranges of numerical values.
{% endhint %}

### **Column Pair Trends**

Does the synthetic data capture trends between pairs of columns?

The *trend* between two columns describes how they vary in relation to each other, for example the correlation. The higher the score, the more the trends are alike.

<figure><img src="https://2284413265-files.gitbook.io/~/files/v0/b/gitbook-x-prod.appspot.com/o/spaces%2FrNLha4DaPNwVJ930KhmB%2Fuploads%2Fv6rZe9pmZQza1tfxBEnv%2FCorrelation%20Similarity.png?alt=media&#x26;token=2847ee25-dc42-45aa-9238-6a69286895ac" alt=""><figcaption></figcaption></figure>

#### Methodology

This property applies a different metric metric based on the type of data

<table><thead><tr><th width="298">Column Types</th><th width="271">Metric</th></tr></thead><tbody><tr><td>numerical (or datetime) with another numerical (or datetime)</td><td><a href="correlationsimilarity">CorrelationSimilarity</a></td></tr><tr><td>categorical (or boolean) with another categorical (or boolean)</td><td><a href="contingencysimilarity">ContingencySimilarity</a></td></tr><tr><td>numerical (or datetime) with a categorical (or boolean)</td><td>Discretize the numerical columns into bins, then apply <a href="contingencysimilarity">ContingencySimilarity</a></td></tr></tbody></table>

This yields a score between every pair of columns.\* The **Column Pair Trends** score is the average of all the scores.

*\*Starting from SDMetrics version 0.27.0, the Quality Report discards pairs that do not exhibit a strong pattern in the real data to begin with. A strong correlation is defined as a Pearson correlation of >0.5 or <-0.5, or a Cramer's association of >0.3.*

{% hint style="info" %}
The CorrelationSimilarity metric works by computing a separate value for the real vs. the synthetic data. The Quality Report shows a side-by-side visualization for real vs. synthetic data when applicable.
{% endhint %}

### Cardinality

{% hint style="warning" %}
This property is only available for multi table datasets. *(In older versions of SDMetrics, it was known as "Table Relationships".)*
{% endhint %}

Does the synthetic data capture the number of connections between parent and child tables? This is also known as the *cardinality* of the tables.&#x20;

<figure><img src="https://2284413265-files.gitbook.io/~/files/v0/b/gitbook-x-prod.appspot.com/o/spaces%2FrNLha4DaPNwVJ930KhmB%2Fuploads%2Fzgh0ueeuAcfMiZNH26B7%2Fsdmetrics-metrics-metrics-glossary-cardinality-shape-similarity_Mar%2010%202026.png?alt=media&#x26;token=d5318596-f927-4c76-9074-bc21c4e6ccbb" alt=""><figcaption></figcaption></figure>

#### Methodology

This property applies the [CardinalityShapeSimilarity](https://docs.sdv.dev/sdmetrics/data-metrics/quality/cardinalityshapesimilarity) metric for every set of connected tables: parent table and child table.&#x20;

### Intertable Trends

{% hint style="warning" %}
This property is only available for multi table datasets.
{% endhint %}

Does the synthetic data capture trends between columns across different tables?

This is similar to the Column Pair Trends property, but it is applied across parent/child tables. For example, a column in a parent table might be correlated with a column in the child.

<figure><img src="https://2284413265-files.gitbook.io/~/files/v0/b/gitbook-x-prod.appspot.com/o/spaces%2FrNLha4DaPNwVJ930KhmB%2Fuploads%2FOH2m2fDMvLI8uyRhCfso%2Fsdmetrics-reports-quality-report-what-is-included-intertable-trends_Mar%2010%202026.png?alt=media&#x26;token=e9a7318d-89b9-422e-b6e5-f1f3ebb69c2a" alt=""><figcaption></figcaption></figure>

#### Methodology

This property denormalizes the parent and child table into a single, flat table. Then, it applies the same metrics as the Column Pair Trends property.

<table><thead><tr><th width="298">Column Types</th><th width="271">Metric</th></tr></thead><tbody><tr><td>numerical (or datetime) with another numerical (or datetime)</td><td><a href="correlationsimilarity">CorrelationSimilarity</a></td></tr><tr><td>categorical (or boolean) with another categorical (or boolean)</td><td><a href="contingencysimilarity">ContingencySimilarity</a></td></tr><tr><td>numerical (or datetime) with a categorical (or boolean)</td><td>Discretize the numerical columns into bins, then apply <a href="contingencysimilarity">ContingencySimilarity</a></td></tr></tbody></table>

This yields a score between every pair of columns\*. The **Intertable Trends** score is the average of all the scores.

*\*Starting from SDMetrics version 0.27.0, the Quality Report discards pairs that do not exhibit a strong pattern in the real data to begin with. A strong correlation is defined as a Pearson correlation of >0.5 or <-0.5, or a Cramer's association of >0.3.*

## Usage

If you have a single-table dataset, refer to the [Single Table API](https://docs.sdv.dev/sdmetrics/data-metrics/quality/quality-report/single-table-api). Pass in a DataFrame containing the real data and synthetic data.

```python
from sdmetrics.reports.single_table import QualityReport

report = QualityReport()
report.generate(real_data, synthetic_data, metadata)
```

If you have a multi-table dataset, use the [Multi Table API](https://docs.sdv.dev/sdmetrics/data-metrics/quality/quality-report/multi-table-api) instead. This allows you to pass in a dictionary containing multiple, connected DataFrame objects.

```python
from sdmetrics.reports.multi_table import QualityReport

report = QualityReport()
report.generate(real_data, synthetic_data, metadata)
```

The Quality Report is not optimized to compute the quality of ordered, sequential information. However, you can still apply the single table report for some basic analysis.
