Diagnostic
Diagnostic metrics capture basic information of synthetic data, such as the format and validity. They represent the most basic kinds of measurements you can make to ensure nothing is going wrong in your synthetic data creation process.
We expect that diagnostic metrics should almost always achieve perfect scores. The only exception would be if you have made an explicit choice to deviate from the real data in some way. (For example, you purposely want the synthetic data to go out-of-bounds.)
Diagnostic Report
Measure basic diagnostic metrics at once. The Diagnostic Report is designed to capture basic diagnostic measurements across your entire dataset at once, reporting areas that may be problematic.
from sdmetrics.reports.single_table import DiagnosticReport
report = DiagnosticReport()
report.generate(real_data, synthetic_data, metadata)
Generating report ...
(1/2) Evaluating Data Validity: |██████████| 9/9 [00:00<00:00, 458.92it/s]|
Data Validity Score: 100.0%
(2/2) Evaluating Data Structure: |██████████| 1/1 [00:00<00:00, 104.60it/s]|
Data Structure Score: 100.0%
Overall Score (Average): 100.0%
Browse
Alternatively, you can apply diagnostic metrics to individual columns and tables in your data:
BoundaryAdherence, CategoryAdherence: measure the validity of statistical values
KeyUniqueness: measure the validity of primary keys
ReferentialIntegrity, CardinalityBoundaryAdherence: measure the validity of a connection between a foreign and primary key
TableStructure: measure whether the overall structure of the data is the same
Last updated