# Diagnostic Report

The Diagnostic Report is designed to capture basic diagnostic measurements across your entire dataset at once, reporting areas that may be problematic. Use this as a first step to ensuring that you have created valid synthetic data.

```python
from sdmetrics.reports.single_table import DiagnosticReport

report = DiagnosticReport()
report.generate(real_data, synthetic_data, metadata)
```

```
Generating report ...

(1/2) Evaluating Data Validity: |██████████| 9/9 [00:00<00:00, 458.92it/s]|
Data Validity Score: 100.0%

(2/2) Evaluating Data Structure: |██████████| 1/1 [00:00<00:00, 104.60it/s]|
Data Structure Score: 100.0%

Overall Score (Average): 100.0%
```

{% hint style="success" %}
:100: **The score should be close to 100%.** The diagnostic report checks for basic data validity and data structure issues. If you want to create synthetic data that looks and feels similar to the real data, you should expect the score to be perfect. If you are using any of the default SDV synthesizers, the score should always be 1.0.
{% endhint %}

## How does it work?

The diagnostic report captures the **Validity**, **Structure** and **Relationship Validity**. This guide contains some technical details about each property.

### Data Validity

Does each column in the data contain valid data?

<figure><img src="/files/XbskJT3oDcaiPhLdWRIx" alt=""><figcaption></figcaption></figure>

#### Methodology

This property applies metrics based on the column types.

<table><thead><tr><th width="186">Column Type</th><th width="194">Metric</th><th>Validity Check</th></tr></thead><tbody><tr><td>primary keys</td><td><a href="/pages/YlwEoEwKsGFpIuq38pzn">KeyUniqueness</a></td><td>Primary keys must always be unique and non-null</td></tr><tr><td>numerical, datetime</td><td><a href="/pages/2enr2rLXB6mulN1uWuEV">BoundaryAdherence</a></td><td>Continuous values in the synthetic data must adhere to the min/max range in the real data</td></tr><tr><td>categorical, boolean</td><td><a href="/pages/D2VB6WKxdZ0si3HhGzkw">CategoryAdherence</a></td><td>Discrete values in the synthetic data must adhere to the same categories as the real data.</td></tr></tbody></table>

This yields a separate score for every column. The final **Data Validity** score is the average of all columns.

### Data Structure

Does each table have the same overall structure as the real data? The structure includes the column names.

<figure><img src="/files/IHvH4qT6bCpJ0E2Ukng7" alt=""><figcaption></figcaption></figure>

#### Methodology

This property applies the [TableStructure](/sdmetrics/data-metrics/diagnostic/tablestructure.md) metric to each table of the dataset. This checks to see that there are the same set of column names in the synthetic vs. the real data.

### Relationship Validity

{% hint style="info" %}
This property is only available for multi table datasets.
{% endhint %}

Does the synthetic data contain valid relationships between different tables?

<figure><img src="/files/EJ7B8f4vIvMLijv2YlNq" alt=""><figcaption></figcaption></figure>

#### Methodology

Every relationship in your dataset is determined by a primary/foreign key connection. This property applies two metrics to the relationship to determine the validity:

* [ReferentialIntegrity](/sdmetrics/data-metrics/diagnostic/referentialintegrity.md): Does each foreign key refer to an existing primary key? If a foreign key refers to a non-existent primary key, it is known as an *orphaned child*, which is invalid in most databases.
* [CardinalityBoundaryAdherence](/sdmetrics/data-metrics/diagnostic/cardinalityboundaryadherence.md): Does each primary key have the correct number of children? The correct number is based on the min/max bounds that are present in the real data.

The final **Relationship Validity** score is the average of all the sub scores.

## Usage

If you have a single-table dataset, refer to the [Single Table API](/sdmetrics/data-metrics/diagnostic/diagnostic-report/single-table-api.md). Pass in a DataFrame containing the real data and synthetic data.

```python
from sdmetrics.reports.single_table import DiagnosticReport

report = DiagnosticReport()
report.generate(real_data, synthetic_data, metadata)
```

If you have a multi-table dataset, use the [Multi Table API](/sdmetrics/data-metrics/diagnostic/diagnostic-report/multi-table-api.md) instead. This allows you to pass in a dictionary containing multiple, connected DataFrame objects.

```python
from sdmetrics.reports.multi_table import DiagnosticReport

report = DiagnosticReport()
report.generate(real_data, synthetic_data, metadata)
```


---

# Agent Instructions: Querying This Documentation

If you need additional information that is not directly available in this page, you can query the documentation dynamically by asking a question.

Perform an HTTP GET request on the current page URL with the `ask` query parameter:

```
GET https://docs.sdv.dev/sdmetrics/data-metrics/diagnostic/diagnostic-report.md?ask=<question>
```

The question should be specific, self-contained, and written in natural language.
The response will contain a direct answer to the question and relevant excerpts and sources from the documentation.

Use this mechanism when the answer is not explicitly present in the current page, you need clarification or additional context, or you want to retrieve related documentation sections.
