Visualization

Use these functions to visualize your actual data in 1 or 2-dimensional space. This can help you see what kind of patterns the synthetic data has learned, and identify differences between the real and synthetic data.

get_column_plot

Use this function to visualize a real column against the same synthetic column. You can plot any column of type: boolean, categorical, datetime or numerical.

from sdv.evaluation.single_table import get_column_plot

fig = get_column_plot(
    real_data=real_data,
    synthetic_data=synthetic_data,
    metadata=metadata,
    column_name='amenities_fee'
)
    
fig.show()

Parameters

  • (required) real_data: A pandas DataFramearrow-up-right object containing the table of your real data

  • (required) synthetic_data: A pandas DataFramearrow-up-right object containing the synthetic data

  • (required) metadata: A SingleTableMetadata object that describes the columns

  • (required) column_name: The name of the column you want to plot

  • plot_type: The type of plot to create

    • (default) None: Determine an appropriate plot type based on your data type, as specified in the metadata.

    • 'bar': Plot the data as distinct bar graphs

    • 'displot': Plot the data as a smooth, continuous curves

Output A plotly Figurearrow-up-right object that plots the distribution. This will change based on the sdtype.

circle-info

Use fig.show() to see the plot in an iPython notebook. The plot is interactive, allowing you to zoom, scroll and take screenshots.

get_column_pair_plot

Use this utility to visualize the trends between a pair of columns for real and synthetic data. You can plot any 2 columns of type: boolean, categorical, datetime or numerical. The columns do not have to the be the same type.

Parameters

  • (required) real_data: A pandas DataFramearrow-up-right object containing the table of your real data

  • (required) synthetic_data: A pandas DataFramearrow-up-right object containing the synthetic data

  • (required) metadata: A SingleTableMetadata object that describes the columns

  • (required) column_names: A list with the names of the 2 columns you want to plot

  • plot_type: The type of plot to create

    • (default) None: Determine an appropriate plot type based on your data type, as specified in the metadata.

    • 'box': Create a box plot showing the quartiles, broken down by different attributes

    • 'heatmap': Create a side-by-side headmap showing the frequency of each pair of values

    • 'scatter': Create a scatter plot that plots each point on a 2D axis

  • sample_size: The number of data points to plot

    • (default) None: Plot all the data points

    • <integer>: Subsample rows from both the real and synthetic data before plotting. Use this if you have a lot of data points.

Output A plotly Figurearrow-up-right object that plots the 2D distribution. This will change based on the sdtype.

circle-info

Use fig.show() to see the plot in an iPython notebook. The plot is interactive, allowing you to zoom, scroll and take screenshots.

Last updated