Visualization
Use these functions to visualize your actual data in 1 or 2-dimensional space. This can help you see what kind of patterns the synthetic data has learned, and identify differences between the real and synthetic data.
get_column_plot
Use this function to visualize a real column against the same synthetic column. You can plot any column of type: boolean
, categorical
, datetime
or numerical
.
from sdv.evaluation.multi_table import get_column_plot
fig = get_column_plot(
real_data=real_data,
synthetic_data=synthetic_data,
metadata=metadata,
table_name='guests',
column_name='amenities_fee'
)
fig.show()

Parameters
(required)
real_data
: A pandas DataFrame object containing the table of your real data(required)
synthetic_data
: A pandas DataFrame object containing the synthetic data(required)
metadata
: A MultiTableMetadata object that describes the columns(required)
table_name
: The name of the table(required)
column_name
: The name of the column you want to plotplot_type
: The type of plot to create(default)
None
: Determine an appropriate plot type based on your data type, as specified in the metadata.'bar'
: Plot the data as distinct bar graphs'displot'
: Plot the data as a smooth, continuous curves
Output A plotly Figure object that plots the distribution. This will change based on the sdtype.
get_column_pair_plot
Use this utility to visualize the trends between a pair of columns for real and synthetic data. You can plot any 2 columns of type: boolean
, categorical
, datetime
or numerical
. The columns do not have to the be the same type.
from sdv.evaluation.multi_table import get_column_pair_plot
fig = get_column_pair_plot(
real_data=real_data,
synthetic_data=synthetic_data,
metadata=metadata,
table_name='guests',
column_names=['room_rate', 'room_type'],
)
fig.show()
Parameters
(required)
real_data
: A pandas DataFrame object containing the table of your real data(required)
synthetic_data
: A pandas DataFrame object containing the synthetic data(required)
metadata
: A MultiTableMetadata object that describes the columns(required)
table_name
: The name of the table(required)
column_names
: A list with the names of the 2 columns you want to plotplot_type
: The type of plot to create(default)
None
: Determine an appropriate plot type based on your data type, as specified in the metadata.'box'
: Create a box plot showing the quartiles, broken down by different attributes'heatmap'
: Create a side-by-side headmap showing the frequency of each pair of values'scatter'
: Create a scatter plot that plots each point on a 2D axis
sample_size
: The number of data points to plot(default)
None
: Plot all the data points<integer>
: Subsample rows from both the real and synthetic data before plotting. Use this if you have a lot of data points.
Output A plotly Figure object that plots the 2D distribution. This will change based on the sdtype.
get_cardinality_plot
Use this utility to visualize the cardinality of a multi-table relationship. The cardinality refers to the number of child rows that each parent row has. This could be 0 or more.
from sdv.evaluation.multi_table import get_cardinality_plot
fig = get_cardinality_plot(
real_data=real_data,
synthetic_data=synthetic_data,
child_table_name='guests',
parent_table_name='hotels',
child_foreign_key='user_id',
metadata=metadata)
fig.show()

Parameters
(required)
real_data
: A dictionary mapping each table name to a pandas DataFrame object with the real data(required)
synthetic_data
: A dictionary mapping each table name to a pandas DataFrame object with the synthetic data(required)
child_table_name
: A string describing the name of the child table in the relationship(required)
parent_table_name
: A string describing the name of the parent table in the relationship(required)
child_foreign_key
: A string describing the name of the foreign key column of the child table that references the parent table(required)
metadata
: A MultiTableMetadata object that describes the data
Output A plotly Figure object that plots the cardinality of the real vs. the synthetic data for the provided relationship.
Last updated