Interpreting Results

Benchmark results are available for every synthesizer and dataset pair. The returned results are a pandas DataFrame object.

Synthesizer                Dataset   Dataset_Size_MB   Model_Time   Peak_Memory_KB   Model_Size_MB    Sample_Time    Evaluate_Time   Diagnostic_Score  Quality_Score   NewRowSynthesis
GaussianCopulaSynthesizer  alarm     34.5              123.56       300101           0.981            2012.1         1001.2          1.00000           0.9991991       0.998191        
GaussianCopulaSynthesizer  census    130.2             23546.12     201011           1.232            2012.2         101012.1        1.00000           0.689101        1.0
CTGANSynthesizer           alarm     34.5              NaN          99999999         NaN              NaN            NaN             1.00000           NaN             NaN
CTGANSynthesizer           census    130.2             9919331      9929188110       12.10            123.31         NaN             1.00000           NaN             NaN
IdentitySynthesizer        alarm     34.5              0.00001      10               0.010            2012.2         1000            1.00000           1.0             0.0
IDentitySynthesizer        census    130.2             2            2012.2           0.031            1003           0.321           1.00000           1.0             0.0

Returned Results

The results provide a summary of the benchmarking setup, performance during the execution and the overall evaluation. Browse through the tabs below to learn more about what each result means.

These results summarize the setup of your benchmarking run.

  • Synthesizer: The name of the synthesizer used to model and create the synthetic data

  • Dataset: The name of the dataset that the synthesizer learned to create

  • Dataset_Size_MB: The overall size of the dataset when loaded into Python, in MB

Errors

If the synthesizer crashed at any point in the process, you will see a NaN value from that point onwards. For example, if your synthesizer ran out of memory during the training phase, you'll see NaN values for the model size, sample time, evaluation time and other metrics.

If you had the setting selected, your detailed_results_folder should contain more information about the exact error message.

FAQs

How is the quality score computed?

We compute the quality score by measuring:

  • Whether the individual column shapes in the synthetic data match the real data, and

  • Whether the correlations between pairs of columns are the same between the real and synthetic data

A score of 1 indicates a perfect match, or high quality. A score of 0 indicates that the data is as different as can be. For more information, see the SDMetrics Quality Report.

Last updated

© Copyright 2023, DataCebo, Inc.