LogoLogo
GitHubSlackDataCebo
  • SDMetrics
  • Getting Started
    • Installation
    • Quickstart
    • Metadata
      • Single Table Metadata
      • Multi Table Metadata
      • Sequential Metadata
  • Reports
    • Quality Report
      • What's included?
      • Single Table API
      • Multi Table API
    • Diagnostic Report
      • What's included?
      • Single Table API
      • Multi Table API
    • Other Reports
    • Visualization Utilities
  • Metrics
    • Diagnostic Metrics
      • BoundaryAdherence
      • CardinalityBoundaryAdherence
      • CategoryAdherence
      • KeyUniqueness
      • ReferentialIntegrity
      • TableStructure
    • Quality Metrics
      • CardinalityShapeSimilarity
      • CategoryCoverage
      • ContingencySimilarity
      • CorrelationSimilarity
      • KSComplement
      • MissingValueSimilarity
      • RangeCoverage
      • SequenceLengthSimilarity
      • StatisticMSAS
      • StatisticSimilarity
      • TVComplement
    • Privacy Metrics
      • DCRBaselineProtection
      • DCROverfittingProtection
      • DisclosureProtection
      • DisclosureProtectionEstimate
      • CategoricalCAP
    • ML Augmentation Metrics
      • BinaryClassifierPrecisionEfficacy
      • BinaryClassifierRecallEfficacy
    • Metrics in Beta
      • CSTest
      • Data Likelihood
        • BNLikelihood
        • BNLogLikelihood
        • GMLikelihood
      • Detection: Sequential
      • Detection: Single Table
      • InterRowMSAS
      • ML Efficacy: Sequential
      • ML Efficacy: Single Table
        • Binary Classification
        • Multiclass Classification
        • Regression
      • NewRowSynthesis
      • * OutlierCoverage
      • Privacy Against Inference
      • * SmoothnessSimilarity
  • Resources
    • Citation
    • Contributions
      • Defining your metric
      • Development
      • Release FAQs
    • Enterprise
      • Domain Specific Reports
    • Blog
Powered by GitBook
On this page
  • Usage
  • Generating the report
  • Getting & explaining the results
  • Visualizing the report
  • Saving & loading the report
  • FAQs
  1. Reports
  2. Quality Report

Single Table API

PreviousWhat's included?NextMulti Table API

Last updated 11 months ago

The Single Table Quality Report evaluates how well your synthetic data captures mathematical properties in your data.

Use this report when you have a single table of data.

Usage

Generating the report

QualityReport()

Create your report object by importing it from the single table reports module.

from sdmetrics.reports.single_table import QualityReport

report = QualityReport()

generate(real_data, synthetic_data, metadata)

Generate your report by passing in the data and metadata.

  • (required) real_data: A pandas.DataFrame containing the real data

  • (required) synthetic_data: A pandas.DataFrame containing the synthetic data

  • (required) metadata: A dictionary describing the format and types of data. See for more details.

  • verbose: A boolean describing whether or not to print the report progress and results. Defaults to True. Set this to False to run the report silently.

report.generate(real_data, synthetic_data, metadata)

Once completed, some preliminary scores will be printed out.

Generating report ...

(1/2) Evaluating Column Shapes: |██████████| 9/9 [00:00<00:00, 273.13it/s]|
Column Shapes Score: 89.11%

(2/2) Evaluating Column Pair Trends: |██████████| 36/36 [00:00<00:00, 57.42it/s]|
Column Pair Trends Score: 88.3%

Overall Score (Average): 88.7%

Getting & explaining the results

Every score that the report generates ranges from 0 (worst) to 1 (best)

get_score()

Use this method at any point to retrieve the overall score.

Returns: A floating point value between 0 and 1 that summarizes the quality of your synthetic data.

report.get_score()
0.8049999999999999

get_properties()

Use this method at any point to retrieve each property that the report evaluated

report.get_properties()
Property                Score
Column Shapes           0.8278
Column Pair Trends      0.7872

get_details(property_name)

Use this method to get more details about a particular property.

  • (required) property_name: A string with the name of the property, either 'Column Shapes' or 'Column Pair Trends'

For example, the details for 'Column Shapes' shows the name of each individual column, the metric that was used to compute it and the overall score for that column.

report.get_details(property_name='Column Shapes')
Column          Metric             Score
second_perc     KSComplement       0.627907 
salary          KSComplement       0.869155
gender          TVComplement       0.939535
...    

Visualizing the report

You can visualize the properties and use the SDMetrics utilities to visualize the raw data too.

get_visualization(property_name)

Use this method to visualize the details about a property.

  • (required) property_name: A string with the name of the property, either 'Column Shapes' or 'Column Pair Trends'

fig = report.get_visualization(property_name='Column Shapes')
fig.show()

The exact visualization is based on the property. For example, 'Column Shapes' property visualizes the quality score for every column as well as the metric used to compute it.

Saving & loading the report

You can save your report if you want to share or access it in the future.

save(filepath)

Save the Python report object

  • (required) filepath: The name of file to save the object. This must end with .pkl

report.save(filepath='results/quality_report.pkl')

The report does not save the full real and synthetic datasets, but it does save the metadata along with the score for each property, breakdown and metric.

The score information may still leak sensitive details about your real data. Use caution when deciding where to store the report and who to share it with.

QualityReport.load(filepath)

Load the report from the file

  • (required) filepath: The name of the file where the report is stored

Returns: A QualityReport object.

from sdmetrics.reports.single_table import QualityReport

report = QualityReport.load('results/quality_report.pkl')

FAQs

What is the best way to see the visualizations? Can I save them?

Tip! You can interact with the visualizations when you're viewing them in a notebook. You can zoom, pan and take screenshots.

Returns: A that lists each property name and its associated score

Returns: A that returns more details about the property

Returns: A object

Other visualizations are available! Use the to get more insights into your data. Tip: All visualizations returned in this report are interactive. If you're using an iPython notebook, you can zoom, pan, toggle legends and take screenshots.

This report returns all visualizations as object, which are integrated with most iPython notebooks (eg. Colab, Jupyter)

It's also possible to programmatically save a static image export. See the for more details.

Single Table Metadata
pandas.DataFrame
pandas.DataFrame
plotly.Figure
SDMetrics Visualization Utilities
plotly.Figure
Plotly Guide