Results Summary
Benchmark results are available for every synthesizer and dataset pair. The summary is returned in Python as a pandas DataFrame object, and also written in a CSV file if you've provided a destination folder.
Synthesizer Dataset Dataset_Size_MB Train_Time Peak_Memory_MB Synthesizer_Size_MB Sample_Time Evaluate_Time Diagnostic_Score Quality_Score Privacy_Score Adjusted_Total_Time Adjusted_Quality_Score
GaussianCopulaSynthesizer alarm 34.5 123.56 300101 0.981 2012.1 1001.2 1.00000 0.9991991 0.998191 2136.76 0.9991991
GaussianCopulaSynthesizer census 130.2 2356.12 201011 1.232 2012.2 1012.1 1.00000 0.689101 1.0 4383.82 0.689101
CTGANSynthesizer alarm 34.5 NaN 99999999 NaN NaN NaN 1.00000 NaN NaN 3605.3 0.5829
CTGANSynthesizer census 130.2 3140.4 9929188110 12.10 NaN NaN 1.00000 NaN NaN 3166.6 0.49102
UniformSynthesizer alarm 34.5 1.1 10 0.010 5.2 1000 1.00000 0.5829 1.0 6.3 0.5829
UniformSynthesizer census 130.2 15.5 2012.2 0.031 10.7 0.321 1.00000 0.49102 1.0 26.2 0.49102Description
The summary provides information about the setup, execution performance, and the overall evaluation. Browse through the tabs below to learn more about what each result means.
These results summarize the setup of your benchmarking run.
Synthesizer: The name of the synthesizer used to model and create the synthetic dataDataset: The name of the dataset that the synthesizer learned to createDataset_Size_MB: The overall size of the dataset when loaded into Python, in MB
These results track the execution of the benchmarking script.
Train_Time: The time it took for the synthesizer to learn from the real data and train a model, in secondsPeak_Memory_MB: The maximum memory that the model training took, in MBSynthesizer_Size_MB: An estimate of the final size of the trained model, in MBSample_Time: The time it took to generate synthetic data using the trained model, in seconds
These results summarize the evaluation of the synthetic data against the real data.
Evaluate_Time: The time it took for any additional evaluation of the synthetic data, in secondsDiagnostic_Score: An overall score that summarizes whether the synthetic data passes basic validity checks. The score will be in the range [0, 1], where 0 is the worst and 1 is the best. We expect that the score to be 1 unless there is an issue with the synthesizer. For more information, see the SDMetrics Diagnostic Report.Quality_Score: An overall estimate of whether the synthetic data matches the statistical patterns of the real data. The score will be in the range [0, 1], where 0 is the worst and 1 is the best. We expect the scores to vary based on the synthesizer and dataset. For more information, see the SDMetrics Quality Report.Privacy_Score: An overall estimate of the privacy of the synthesizer. The score will be in the range [0, 1], where 0 is the worst and 1 is the best. We expect the scores to vary based on the synthesizer and dataset, For more information, see SDMetrics DCRBaselineProtection metric.<other results>: Any other metrics that you apply will appear as additional results. Refer to the SDMetrics library for more details about what the metric means.
The final columns represent adjusted times and quality scores. The values are adjust to simulate graceful degradation in the event of a synthesizer failure. As a result, these columns never contain any missing values, making them great for comparing the different synthesizers.
Adjusted_Total_Time: The total time it took for modeling and sampling the synthesizer, in seconds, plus the time it takes for fitting a backup synthesizer (see next section). If the synthesizer errored out during any part of the modeling or sampling process, then the backup synthesizer's values take over from that point onwards.Adjusted_Quality_Score: The quality score of the synthesizer (see the Evaluation tab). If the synthesizer errored out before we were able to compute a quality score, then the backup synthesizer's quality score will be reported.
Graceful Handling of Errors
In the case that some synthesizers crash or time out, the results will show a NaN value from that point of the execution process. For example, if your synthesizer ran out of memory during the sampling phase, you'll see NaN values for the sample time, evaluation time and other metrics.
However in an enterprise setting, it is best practice to gracefully degrade by reverting to a backup synthesizer in the case of an error. To simulate this, SDGym always trains a UniformSynthesizer as backup. The UniformSynthesizer creates random values within the correct ranges. It represents a fast, lightweight backup.
The final columns you see (for time and quality score) are adjusted to simulate having the backup synthesizer. These columns will not contain any missing values, ensuring that you can continue to compare synthesizers fairly even if there are errors for certain datasets.
Last updated