Search…
⌃K
Links

Single Table API

The Single Table Diagnostic Report runs some basic checks on your synthetic data to give a general sense of the strengths and weakness of your synthetic data model.
Use this report when you have a single table of data.

Usage

Generating the report

DiagnosticReport()

Create your report object by importing it from the single table reports module.
from sdmetrics.reports.single_table import DiagnosticReport
report = DiagnosticReport()

generate(real_data, synthetic_data, metadata)

Generate your report by passing in the data and metadata.
  • (required) real_data: A pandas.DataFrame containing the real data
  • (required) synthetic_data: A pandas.DataFrame containing the synthetic data
  • (required) metadata: A dictionary describing the format and types of data. See Single Table Metadata for more details.
  • verbose: A boolean describing whether or not to print the report progress and results. Defaults to True. Set this to False to run the report silently.
report.generate(real_data, synthetic_data, metadata)
You'll see a progress bar as the report is generated. Once completed, the diagnostic results are printed out.
Creating report: 100%|████████████████| 200/200 [01:21<00:03, 2.37it/s]
Diagnostic Results:
SUCCESS
✓ Over 90% of the synthetic rows are not copies of the real data
✓ The synthetic data covers over 90% of the numerical ranges present in the
real data
WARNING
! The synthetic data is missing more than 10% of the categories present in
the real data
DANGER
x More than 50% the synthetic data does not follow the min/max boundaries set
by the real data

Getting & explaining the results

get_results()

Use this method to retrieve the overall diagnostic results
Returns: A dictionary mapping each property name to its status and details
report.get_results()
{
'SUCCESS': [
'Over 90% of the synthetic rows are not copies of the real data',
'The synthetic data covers over 90% of the numerical ranges present in the real data'
]
'WARNING': [
'The synthetic data is missing more than 10% of the categories present in the real data'
],
'DANGER': [
'More than 50% the synthetic data does not follow the min/max boundaries set by the real data'
]
}

get_properties()

Use this method at any point to retrieve each property that the report evaluated
Returns: A dictionary that lists each property name and its associated score
report.get_properties()
{
'Synthesis': 1.0,
'Coverage': 0.85,
'Boundaries': 0.90
}

get_details(property_name)

Use this method to get more details about a particular property.
  • (required) property_name: A string with the name of the property. One of: 'Synthesis', 'Coverage' or 'Boundaries'.
Returns: A pandas.DataFrame object that returns more details about the property
For example, the details for 'Coverage' shows the name of each individual column, the metric that was used to compute it and the overall score for that column.
report.get_details(property_name='Coverage')
Column Metric Diagnostic Score
age RangeCoverage 0.980
height RangeCoverage 0.8400
card_type CategoryCoverage 1.0
...

Visualizing the report

You can visualize the properties and use the SDMetrics utilities to visualize the raw data too.

get_visualization(property_name)

Use this method to visualize the details about a property.
  • (required) property_name: A string with the name of the property. One of: 'Synthesis', 'Coverage' or 'Boundaries'.
Returns: A plotly.Figure object
For example, the 'Coverage' property visualizes the score for every column as well as the metric used to compute it.
fig = report.get_visualization(property_name='Coverage')
fig.show()
Other visualizations are available! Use the SDMetrics Visualization Utilities to get more insights into your data. Tip! All visualizations returned in this report are interactive. If you're using an iPython notebook, you can zoom, pan, toggle legends and take screenshots.

Saving & loading the report

You can save your report if you want to share or access it in the future.

save(filepath)

Save the Python report object
  • (required) filepath: The name of file to save the object. This must end with .pkl
report.save(filepath='results/diagnostic_report.pkl')
The report does not save the full real and synthetic datasets, but it does save the metadata along with the score for each property, breakdown and metric.
The score information may still leak sensitive details about your real data. Use caution when deciding where to store the report and who to share it with.

DiagnosticReport.load(filepath)

Load the report from the file
  • (required) filepath: The name of the file where the report is stored
Returns: A DiagnosticReport object.
from sdmetrics.reports.single_table import DiagnosticReport
report = DiagnosticReport.load('results/diagnostic_report.pkl')

FAQs

This report returns all visualizations as plotly.Figure object, which are integrated with most iPython notebooks (eg. Colab, Jupyter).
Tip! You can interact with the visualizations when you're viewing them in a notebook. You can zoom, pan and take screenshots.
It's also possible to programmatically save a static image export. See the Plotly Guide for more details.
Use the get_raw_result method if you're interested in the details of which methods were run and what the raw results were.
Parameters:
  • metric_name: A string containing the name of the metric. The metrics are defined in Metrics Glossary. Only the metrics included in this report are available.
Returns: A list of multiple dictionaries. Each dictionary contains the information needed to run the metric and its results.
report.get_raw_result(metric_name='KSComplement')
[{
'metric': {
'method': 'single_table.KSComplement.compute_breakdown',
'parameters': None
},
'results': {
'user_id': { 'score': None }
'start_date': { 'score': 0.790 },
'purchase_amt': { 'score': 0.880 },
...
}
}]