Diagnostic

The Diagnostic Report runs some basic checks for data format and validity. Run this to ensure that you have created valid synthetic data.

New and improved! Starting from SDV version 1.8.0, you'll see a new diagnostic intended to find problems with the synthetic data. You will notice some key improvements to the report and its interpretation.

Usage

Run the diagnostic to receive a score and a corresponding report.

run_diagnostic

Use this function to run a diagnostic on the synthetic data.

from sdv.evaluation.single_table import run_diagnostic

diagnostic_report = run_diagnostic(
    real_data=real_data,
    synthetic_data=synthetic_data,
    metadata=metadata)
Generating report ...
(1/2) Evaluating Data Validity: : 100%|██████████| 17/17 [00:00<00:00, 374.65it/s]
(2/2) Evaluating Data Structure: : 100%|██████████| 1/1 [00:00<00:00, 104.39it/s]

Overall Score: 100.0%

Properties:
- Data Validity: 100.0%
- Data Structure: 100.0%

Parameters:

  • (required) real_data: A pandas.DataFrame containing the real data

  • (required) synthetic_data: A pandas.DataFrame containing the synthetic data

  • (required) metadata: A SingleTableMetadata object with your metadata

  • verbose: A boolean describing whether or not to print the report progress and results. Defaults to True. Set this to False to run the report silently.

Returns: An SDMetrics DiagnosticReport object generated with your real and synthetic data

Interpreting the Score

The score should be 100%. The diagnostic report checks for basic data validity and data structure issues. You should expect the score to be perfect for any of the default SDV synthesizers.

What's Included?

The basic diagnostic checks are summarized in the table below.

get_details

This function returns details about the report's properties. Use it to pinpoint the exact columns or tables that are causing issues.

Parameters:

  • (required) property_name: A string with the name of the property. One of: 'Data Validity' or 'Structure'.

Returns A pandas.DataFrame object with the detailed scores

diagnostic_report.get_details(property_name='Data Validity')
Column	        Metric                   Score
guest_email     KeyUniqueness            1.0
had_rewards	CategoryAdherence	 1.0
room_type	CategoryAdherence	 1.0
amenities_fee	BoundaryAdherence	 1.0
...

FAQs

See the SDMetrics DiagnosticReport for even more details about the metrics and properties included in the report.

What should I do if the score is not 100%?

All of the default SDV synthesizers should yield a score of 100%. If this is not the case, please contact us with more details about your project via GitHub or Slack.

Note that you have changed any of the defaults — for example, if you have turned off min/max boundary enforcement — then the score may not be 100%.

How did you determine what the validity checks should be?

The items in this report answer the most basic, data validity questions that we have heard from our users and customers. We've collected thousands of pieces of feedback to come up with this basic set.

If you have any questions or suggestions, please contact us via GitHub or Slack.

Older versions of the Diagnostic report contained other metrics. Can I still use them?

Yes! You can compute additional metrics using our standalone SDMetrics library.

If you're used to older versions of the SDV, you may be looking for NewRowSynthesis, CategoryCoverage, and RangeCoverage.

Last updated

Copyright (c) 2023, DataCebo, Inc.