Synthetic Data Vault
GitHubSlackDataCebo
  • Welcome to the SDV!
  • Tutorials
  • Explore SDV
    • SDV Community
    • SDV Enterprise
      • ⭐Compare Features
    • SDV Bundles
      • ❖ AI Connectors
      • ❖ CAG
      • ❖ Differential Privacy
      • ❖ XSynthesizers
  • Single Table Data
    • Data Preparation
      • Loading Data
      • Creating Metadata
    • Modeling
      • Synthesizers
        • GaussianCopulaSynthesizer
        • CTGANSynthesizer
        • TVAESynthesizer
        • ❖ XGCSynthesizer
        • ❖ SegmentSynthesizer
        • * DayZSynthesizer
        • ❖ DPGCSynthesizer
        • ❖ DPGCFlexSynthesizer
        • CopulaGANSynthesizer
      • Customizations
        • Constraints
        • Preprocessing
    • Sampling
      • Sample Realistic Data
      • Conditional Sampling
    • Evaluation
      • Diagnostic
      • Data Quality
      • Visualization
  • Multi Table Data
    • Data Preparation
      • Loading Data
        • Demo Data
        • CSV
        • Excel
        • ❖ AlloyDB
        • ❖ BigQuery
        • ❖ MSSQL
        • ❖ Oracle
        • ❖ Spanner
      • Cleaning Your Data
      • Creating Metadata
    • Modeling
      • Synthesizers
        • * DayZSynthesizer
        • * IndependentSynthesizer
        • HMASynthesizer
        • * HSASynthesizer
      • Customizations
        • Constraints
        • Preprocessing
      • * Performance Estimates
    • Sampling
    • Evaluation
      • Diagnostic
      • Data Quality
      • Visualization
  • Sequential Data
    • Data Preparation
      • Loading Data
      • Cleaning Your Data
      • Creating Metadata
    • Modeling
      • PARSynthesizer
      • Customizations
    • Sampling
      • Sample Realistic Data
      • Conditional Sampling
    • Evaluation
  • Concepts
    • Metadata
      • Sdtypes
      • Metadata API
      • Metadata JSON
    • Constraints
      • Predefined Constraints
        • Positive
        • Negative
        • ScalarInequality
        • ScalarRange
        • FixedIncrements
        • FixedCombinations
        • ❖ FixedNullCombinations
        • ❖ MixedScales
        • OneHotEncoding
        • Inequality
        • Range
        • * ChainedInequality
      • Custom Logic
        • Example: IfTrueThenZero
      • ❖ Constraint Augmented Generation (CAG)
        • ❖ CarryOverColumns
        • ❖ CompositeKey
        • ❖ ForeignToForeignKey
        • ❖ ForeignToPrimaryKeySubset
        • ❖ PrimaryToPrimaryKey
        • ❖ PrimaryToPrimaryKeySubset
        • ❖ SelfReferentialHierarchy
        • ❖ ReferenceTable
        • ❖ UniqueBridgeTable
  • Support
    • Troubleshooting
      • Help with Installation
      • Help with SDV
    • Versioning & Backwards Compatibility Policy
Powered by GitBook

Copyright (c) 2023, DataCebo, Inc.

On this page
  • get_column_plot
  • get_column_pair_plot
  1. Single Table Data
  2. Evaluation

Visualization

PreviousData QualityNextData Preparation

Last updated 4 months ago

Use these functions to visualize your actual data in 1 or 2-dimensional space. This can help you see what kind of patterns the synthetic data has learned, and identify differences between the real and synthetic data.

get_column_plot

Use this function to visualize a real column against the same synthetic column. You can plot any column of type: boolean, categorical, datetime or numerical.

from sdv.evaluation.single_table import get_column_plot

fig = get_column_plot(
    real_data=real_data,
    synthetic_data=synthetic_data,
    metadata=metadata,
    column_name='amenities_fee'
)
    
fig.show()

Parameters

  • (required) column_name: The name of the column you want to plot

  • plot_type: The type of plot to create

    • (default) None: Determine an appropriate plot type based on your data type, as specified in the metadata.

    • 'bar': Plot the data as distinct bar graphs

    • 'displot': Plot the data as a smooth, continuous curves

Use fig.show() to see the plot in an iPython notebook. The plot is interactive, allowing you to zoom, scroll and take screenshots.

get_column_pair_plot

Use this utility to visualize the trends between a pair of columns for real and synthetic data. You can plot any 2 columns of type: boolean, categorical, datetime or numerical. The columns do not have to the be the same type.

from sdv.evaluation.single_table import get_column_pair_plot

fig = get_column_pair_plot(
    real_data=real_data,
    synthetic_data=synthetic_data,
    metadata=metadata,
    column_names=['room_rate', 'room_type'],
    )
    
fig.show()

Parameters

  • (required) column_names: A list with the names of the 2 columns you want to plot

  • plot_type: The type of plot to create

    • (default) None: Determine an appropriate plot type based on your data type, as specified in the metadata.

    • 'box': Create a box plot showing the quartiles, broken down by different attributes

    • 'heatmap': Create a side-by-side headmap showing the frequency of each pair of values

    • 'scatter': Create a scatter plot that plots each point on a 2D axis

  • sample_size: The number of data points to plot

    • (default) None: Plot all the data points

    • <integer>: Subsample rows from both the real and synthetic data before plotting. Use this if you have a lot of data points.

Use fig.show() to see the plot in an iPython notebook. The plot is interactive, allowing you to zoom, scroll and take screenshots.

(required) real_data: A object containing the table of your real data. To skip plotting the real data, input None.

(required) synthetic_data: A object containing the synthetic data. To skip plotting the synthetic data, input None.

(required) metadata: A object that describes the columns

Output A object that plots the distribution. This will change based on the sdtype.

(required) real_data: A object containing the table of your real data. To skip plotting the real data, input None.

(required) synthetic_data: A object containing the synthetic data. To skip plotting the synthetic data, input None.

(required) metadata: A object that describes the columns

Output A object that plots the 2D distribution. This will change based on the sdtype.

pandas DataFrame
pandas DataFrame
Metadata
plotly Figure
pandas DataFrame
pandas DataFrame
Metadata
plotly Figure