Diagnostic
The Diagnostic Report runs some basic checks for data format and validity. Run this to ensure that you have created valid synthetic data.
New and improved! Starting from SDV version 1.8.0, you'll see a new diagnostic intended to find problems with the synthetic data. You will notice some key improvements to the report and its interpretation.
Usage
Run the diagnostic to receive a score and a corresponding report.
run_diagnostic
Use this function to run a diagnostic on the synthetic data.
Parameters:
(required)
real_data
: A pandas.DataFrame containing the real data(required)
synthetic_data
: A pandas.DataFrame containing the synthetic data(required)
metadata
: A MultiTableMetadata object with your metadataverbose
: A boolean describing whether or not to print the report progress and results. Defaults toTrue
. Set this toFalse
to run the report silently.
Returns: An SDMetrics DiagnosticReport object generated with your real and synthetic data
Interpreting the Score
The score should be 100%. The diagnostic report checks for basic data validity and data structure issues. You should expect the score to be perfect for any of the default SDV synthesizers.
What's Included?
The basic diagnostic checks are summarized in the table below.
Property | Description |
---|---|
Data Validity | Basic validity checks for each of the columns:
|
Relationship Validity | Basic validity checks for each relationship between a parent table and a child table:
|
Structure | Checks to ensure the real and synthetic data have the same column names |
get_details
This function returns details about the report's properties. Use it to pinpoint the exact columns or tables that are causing issues.
Parameters:
(required)
property_name
: A string with the name of the property. One of:'Data Validity'
,'Structure'
, or'Relationship Validity'
table_name
: A string with the name of the table. If provided, you'll receive filtered results for the table.
Returns A pandas.DataFrame object with the detailed scores
FAQs
See the SDMetrics DiagnosticReport for even more details about the metrics and properties included in the report.
Last updated