Visualization
Use these functions to visualize your actual data in 1 or 2-dimensional space. This can help you see what kind of patterns the synthetic data has learned, and identify differences between the real and synthetic data.
get_column_plot
Use this function to visualize a real column against the same synthetic column. You can plot any column of type: boolean
, categorical
, datetime
or numerical
.
Parameters
(required)
real_data
: A pandas DataFrame object containing the table of your real data(required)
synthetic_data
: A pandas DataFrame object containing the synthetic data(required)
metadata
: A SingleTableMetadata object that describes the columns(required)
column_name
: The name of the column you want to plotplot_type
: The type of plot to create(default)
None
: Determine an appropriate plot type based on your data type, as specified in the metadata.'bar'
: Plot the data as distinct bar graphs'displot'
: Plot the data as a smooth, continuous curves
Output A plotly Figure object that plots the distribution. This will change based on the sdtype.
Use fig.show()
to see the plot in an iPython notebook. The plot is interactive, allowing you to zoom, scroll and take screenshots.
get_column_pair_plot
Use this utility to visualize the trends between a pair of columns for real and synthetic data. You can plot any 2 columns of type: boolean
, categorical
, datetime
or numerical
. The columns do not have to the be the same type.
Parameters
(required)
real_data
: A pandas DataFrame object containing the table of your real data(required)
synthetic_data
: A pandas DataFrame object containing the synthetic data(required)
metadata
: A SingleTableMetadata object that describes the columns(required)
column_names
: A list with the names of the 2 columns you want to plotplot_type
: The type of plot to create(default)
None
: Determine an appropriate plot type based on your data type, as specified in the metadata.'box'
: Create a box plot showing the quartiles, broken down by different attributes'heatmap'
: Create a side-by-side headmap showing the frequency of each pair of values'scatter'
: Create a scatter plot that plots each point on a 2D axis
sample_size
: The number of data points to plot(default)
None
: Plot all the data points<integer>
: Subsample rows from both the real and synthetic data before plotting. Use this if you have a lot of data points.
Output A plotly Figure object that plots the 2D distribution. This will change based on the sdtype.
Use fig.show()
to see the plot in an iPython notebook. The plot is interactive, allowing you to zoom, scroll and take screenshots.
Last updated