# Single Table API The Single Table Quality Report evaluates how well your synthetic data captures mathematical properties in your data. Use this report when you have a single table of data. ## Usage ### Generating the report #### QualityReport() Create your report object by importing it from the single table reports module. ```python from sdmetrics.reports.single_table import QualityReport report = QualityReport() ``` #### generate(real\_data, synthetic\_data, metadata) Generate your report by passing in the data and metadata. * (required) `real_data`: A pandas.DataFrame containing the real data * (required) `synthetic_data`: A pandas.DataFrame containing the synthetic data * (required) `metadata`: A dictionary describing the format and types of data. See [Single Table Metadata](/sdmetrics/getting-started/metadata/single-table-metadata.md) for more details. * `verbose`: A boolean describing whether or not to print the report progress and results. Defaults to `True`. Set this to `False` to run the report silently. ```python report.generate(real_data, synthetic_data, metadata) ``` Once completed, some preliminary scores will be printed out. ``` Generating report ... (1/2) Evaluating Column Shapes: |██████████| 9/9 [00:00<00:00, 273.13it/s]| Column Shapes Score: 89.11% (2/2) Evaluating Column Pair Trends: |██████████| 36/36 [00:00<00:00, 57.42it/s]| Column Pair Trends Score: 88.3% Overall Score (Average): 88.7% ``` ### Getting & explaining the results Every score that the report generates ranges from 0 (worst) to 1 (best) #### get\_score() Use this method at any point to retrieve the overall score. Returns: A floating point value between 0 and 1 that summarizes the quality of your synthetic data. ```python report.get_score() ``` ```python 0.8049999999999999 ``` #### get\_properties() Use this method at any point to retrieve each property that the report evaluated Returns: A [pandas.DataFrame](https://pandas.pydata.org/docs/reference/api/pandas.DataFrame.html) that lists each property name and its associated score ```python report.get_properties() ``` ```python Property Score Column Shapes 0.8278 Column Pair Trends 0.7872 ``` #### get\_details(property\_name) Use this method to get more details about a particular property. * (required) `property_name`: A string with the name of the property, either `'Column Shapes'` or `'Column Pair Trends'` Returns: A [pandas.DataFrame](https://pandas.pydata.org/docs/reference/api/pandas.DataFrame.html) that returns more details about the property For example, the details for `'Column Shapes'` shows the name of each individual column, the metric that was used to compute it and the overall score for that column. ```python report.get_details(property_name='Column Shapes') ``` ```python Column Metric Score second_perc KSComplement 0.627907 salary KSComplement 0.869155 gender TVComplement 0.939535 ... ``` ### Visualizing the report You can visualize the properties and use the SDMetrics utilities to visualize the raw data too. #### get\_visualization(property\_name) Use this method to visualize the details about a property. * (required) `property_name`: A string with the name of the property, either `'Column Shapes'` or `'Column Pair Trends'` Returns: A [plotly.Figure](https://plotly.com/python-api-reference/generated/plotly.graph_objects.Figure.html) object ```python fig = report.get_visualization(property_name='Column Shapes') fig.show() ``` The exact visualization is based on the property. For example, `'Column Shapes'` property visualizes the quality score for every column as well as the metric used to compute it.

{% hint style="success" %} **Other visualizations are available!** Use the [SDMetrics Visualization Utilities](/sdmetrics/getting-started/visualization-utilities.md) to get more insights into your data.\ \ **Tip:** All visualizations returned in this report are interactive. If you're using an iPython notebook, you can zoom, pan, toggle legends and take screenshots. {% endhint %} ### Saving & loading the report You can save your report if you want to share or access it in the future. #### save(filepath) Save the Python report object * (required) `filepath`: The name of file to save the object. This must end with `.pkl` ```python report.save(filepath='results/quality_report.pkl') ``` {% hint style="warning" %} The report does not save the full real and synthetic datasets, but it does save the metadata along with the score for each property, breakdown and metric. The score information may still leak sensitive details about your real data. Use caution when deciding where to store the report and who to share it with. {% endhint %} #### QualityReport.load(filepath) Load the report from the file * (required) `filepath`: The name of the file where the report is stored Returns: A `QualityReport` object. ```python from sdmetrics.reports.single_table import QualityReport report = QualityReport.load('results/quality_report.pkl') ``` ## FAQs

What is the best way to see the visualizations? Can I save them?

This report returns all visualizations as [plotly.Figure](https://plotly.com/python-api-reference/generated/plotly.graph_objects.Figure.html) object, which are integrated with most iPython notebooks (eg. Colab, Jupyter) **Tip!** You can interact with the visualizations when you're viewing them in a notebook. You can zoom, pan and take screenshots. It's also possible to programmatically save a static image export. See the [Plotly Guide](https://plotly.com/python/static-image-export/) for more details.

Can this report check for similarity in higher orders?

Higher order distributions of 3 or more columns are not included in the Quality Report. We have found that very high order similarity may have an adverse effect on the synthetic data usability; after a certain point, it indicates that the synthetic data is just a copy of the real data. (For more information, see [Privacy Metrics](/sdmetrics/data-metrics/privacy.md).) If higher order similarity is a requirement, you likely have a targeted use case for synthetic data (eg. machine learning efficacy). Until we add these reports, you may want to explore other [metrics](/sdmetrics/data-metrics/diagnostic.md).

Is the score deterministic?

Every [metric](/sdmetrics/data-metrics/quality/quality-report/whats-included.md) that the quality report computes is deterministic. However, if your dataset contains over 50K rows, the quality report may subsample your data to compute specific metrics such as [ContingencySimilarity](/sdmetrics/data-metrics/quality/quality-report/whats-included.md#column-pair-trends). This improves the performance. Since the subsampling is random, you may observe that **the score can vary slightly** if you re-run the quality report. We've chosen the threshold of 50K such that the variation in the quality score will only be a small percentage off from the actual value. If you'd like to turn subsampling off and make your score deterministic, you can set the `num_rows_subsample` attribute to `None`. (Though keep in mind that this can increase the time it takes to generate the report.) ```python from sdmetrics.reports.single_table import QualityReport report = QualityReport() report.num_rows_subsample = None report.generate(real_data, synthetic_data, metadata) ```

--- # Agent Instructions: Querying This Documentation If you need additional information that is not directly available in this page, you can query the documentation dynamically by asking a question. Perform an HTTP GET request on the current page URL with the `ask` query parameter: ``` GET https://docs.sdv.dev/sdmetrics/data-metrics/quality/quality-report/single-table-api.md?ask= ``` The question should be specific, self-contained, and written in natural language. The response will contain a direct answer to the question and relevant excerpts and sources from the documentation. Use this mechanism when the answer is not explicitly present in the current page, you need clarification or additional context, or you want to retrieve related documentation sections.