Evaluation

As a final step to your synthetic data project, you can evaluate and visualize the synthetic data against the real data. Using the SDV, you can diagnose any problems in the synthetic data, evaluate the data quality and visualize the data. Click the sections below to learn more.

Perform basic checks to ensure the synthetic data is valid.

Compare the real and synthetic data's statistical similarity.

Visualize the real and synthetic data side-by-side

from sdv.evaluation.single_table import run_diagnostic, evaluate_quality
from sdv.evaluation.single_table import get_column_plot

# 1. perform basic validity checks
diagnostic = run_diagnostic(real_data, synthetic_data, metadata)

# 2. measure the statistical similarity
quality_report = evaluate_quality(real_data, synthetic_data, metadata)

# 3. plot the data
fig = get_column_plot(
    real_data=real_data,
    synthetic_data=synthetic_data,
    metadata=metadata,
    column_name='amenities_fee'
)
    
fig.show()

Need more evaluation options?

See the SDMetrics library.

This library includes many more metrics (some experimental) that you can apply based on your goals. All you need is your real data, synthetic data and metadata to get started.

Last updated

Copyright (c) 2023, DataCebo, Inc.