Search…
⌃K
Links

Multi Table API

The Multi Table Quality Report evaluates how well your synthetic data captures mathematical properties in your data:
Use this report when you have multiple, connected tables of data.

Usage

Generating the report

QualityReport()

Create your report object by importing it from the multi table reports module.
from sdmetrics.reports.multi_table import QualityReport
report = QualityReport()

generate(real_data, synthetic_data, metadata)

Generate your report by passing in the data and metadata.
  • (required) real_data: A dictionary mapping the name of each table to a pandas.DataFrame containing the real data for that table
  • (required) synthetic_data: A dictionary mapping the name of each table to a pandas.DataFrame containing the synthetic data for that table
  • (required) metadata: A dictionary describing the format, types of data and relationship between the tables. See Multi Table Metadata for more details.
  • verbose: A boolean describing whether or not to print the report progress and results. Defaults to True. Set this to False to run the report silently.
report.generate(real_data, synthetic_data, metadata)
Once completed, some preliminary scores will be printed out.
Creating report: 100%|██████████| 4/4 [00:00<00:00, 7.09it/s]
Overall Quality Score: 82.84%
Properties:
Column Shapes: 82.78%
Column Pair Trends: 82.9%
Table Relationships = 77%

Getting & explaining the results

Every score that the report generates ranges from 0 (worst) to 1 (best)

get_score()

Use this method at any point to retrieve the overall score.
Returns: A floating point value between 0 and 1 that summarizes the quality of your synthetic data.
report.get_score()
0.783449101193

get_properties()

Use this method at any point to retrieve each property that the report evaluated
Returns: A pandas.DataFrame that lists each property name and its associated score
report.get_properties()
Property Score
Column Shapes 0.841484929101
Column Pair Trends 0.744250193991
Parent Child Relationships 0.771100011144

get_details(property_name, table_name)

Use this method to get more details about a particular property.
  • (required) property_name: A string with the name of the property. One of: 'Column Shapes', 'Column Pair Trends' or 'Parent Child Relationships'
  • table_name: A string with the name of the table. If provided, you'll receive filtered results for the table.
Returns: A pandas.DataFrame that returns more details about the property for the given table
For example, the details for 'Column Shapes' shows the name of each individual column, the metric that was used to compute it and the overall score for that column.
report.get_details(
property_name='Column Shapes',
table_name='users')
Table Column Metric Quality Score
users purchase_amt KSComplement 0.880
users card_type TVComplement 0.690
users start_date KSComplement 0.790
...

Visualizing the report

You can visualize the properties and use the SDMetrics utilities to visualize the raw data too.

get_visualization(property_name, table_name)

Use this method to visualize the details about a property.
  • (required) property_name: A string with the name of the property. One of: 'Column Shapes', 'Column Pair Trends' or 'Parent Child Relationships'
  • (required) table_name: A string with the name of the table
Returns: A plotly.Figure object
fig = report.get_visualization(
property_name='Column Shapes',
table_name='users')
fig.show()
The exact visualization is based on the property. For example, 'Column Shapes' property visualizes the quality score for every column as well as the metric used to compute it.
Other visualizations are available! Use the SDMetrics Visualization Utilities to get more insights into your data. Tip: All visualizations returned in this report are interactive. If you're using an iPython notebook, you can zoom, pan, toggle legends and take screenshots.

Saving & loading the report

You can save your report if you want to share or access it in the future.

save(filepath)

Save the Python report object
  • (required) filepath: The name of file to save the object. This must end with .pkl
report.save(filepath='results/quality_report.pkl')
The report does not save the full real and synthetic datasets, but it does save the metadata along with the score for each property, breakdown and metric.
The score information may still leak sensitive details about your real data. Use caution when deciding where to store the report and who to share it with.

QualityReport.load(filepath)

Load the report from the file
  • (required) filepath: The name of the file where the report is stored
Returns: A QualityReport object.
from sdmetrics.reports.multi_table import QualityReport
report = QualityReport.load('results/quality_report.pkl')

FAQs

Higher order distributions of 3 or more columns are not included in the Quality Report. We have found that higher order similarity may have an adverse effect on the synthetic data usability; after a certain point, it indicates that the synthetic data is just a copy of the real data. (For more information, see the SyntheticUniqueness metric.)
If higher order similarity is a requirement, you likely have a targeted use case for synthetic data (eg. machine learning efficacy). Until we add these reports, you may want to explore other metrics in the Glossary or in Beta.
This report returns all visualizations as plotly.Figure object, which are integrated with most iPython notebooks (eg. Colab, Jupyter).
Tip! You can interact with the visualizations when you're viewing them in a notebook. You can zoom, pan and take screenshots.
It's also possible to programmatically save a static image export. See the Plotly Guide for more details.
Use the get_raw_result method if you're interested in the details of which methods were run and what the raw results were.
Parameters:
  • metric_name: A string containing the name of the metric. The metrics are defined in Metrics Glossary. Only the metrics included in this report are available.
Returns: A list of multiple dictionaries. Each dictionary contains the information needed to run the metric and its results.
report.get_raw_result(metric_name='KSComplement')
[{
'metric': {
'method': 'single_table.KSComplement.compute_breakdown',
'parameters': None
},
'results': {
'user_id': { 'score': None }
'start_date': { 'score': 0.790 },
'purchase_amt': { 'score': 0.880 },
...
}
}]