Visualization Utilities

Use the utilities below to visualize the comparison between real and synthetic data. You can access these from the sdmetrics.visualization module.

Compare a synthetic column & real column (1D)

get_column_plot

Use this utility to visualize a real column against the same synthetic column. You can plot any column of type: boolean, categorical, datetime or numerical.

  • (required) real_data: A pandas.DataFrame containing the table of your real data. To skip plotting the real data, input None.

  • (required) synthetic_data: A pandas.DataFrame containing the synthetic data. To skip plotting the synthetic data, input None.

  • (required) column_name: The name of the column you want to plot.

  • plot_type: The type of plot to create

    • (default) None: Determine the type of plot to create based on the data.

    • 'distplot': Plot the data as a smooth, continuous distribution. Use this for continuous columns.

    • 'bar': Plot the data as discrete bars. Use this for discrete columns.

Returns: A plotly.Figure object

from sdmetrics.visualization import get_column_plot

fig = get_column_plot(
    real_data=real_table,
    synthetic_data=synthetic_table,
    column_name='high_perc',
    plot_type='distplot'
)

fig.show()

Compare a pair of synthetic columns & real columns (2D)

utils.get_column_pair_plot

Use this utility to visualize the trends between a pair of columns for real and synthetic data. You can plot any 2 columns of type: boolean, categorical, datetime or numerical. The columns do not have to the be the same type.

  • (required) real_data: A pandas.DataFrame containing the table of your real data. To skip plotting the real data, input None.

  • (required) synthetic_data: A pandas.DataFrame containing the synthetic data. To skip plotting the synthetic data, input None.

  • (required) column_names: A list containing the names of the 2 columns you want to plot.

  • plot_type: The type of plot to create

    • (default) None: Determine the type of plot to create based on the data.

    • 'scatter': Plot each data point in 2D space as a scatter plot. Use this to compare a pair of continuous columns.

    • 'box': Plot the data as one or more box plot. Use this to compare a continuous column with a discrete column.

    • 'heatmap': Plot a side-by-side headmap of the data's categories. Use this to compare a pair of discrete columns.

Returns: A plotly.Figure object

from sdmetrics.visualization import get_column_pair_plot

fig = get_column_pair_plot(
    real_data=real_table,
    synthetic_data=synthetic_table,
    column_names=['mba_perc', 'degree_perc'],
    plot_type='scatter'
    
)

fig.show()

Various types of plots are possible based on the types of data you provide

Visualize the cardinality of a relationship

utils.get_cardinality_plot

Use this utility to visualize the cardinality of parent-child relationship. The cardinality is the # of children that each parent row has. Your cardinality may be fixed (eg. every parent has exactly 2 children) or variable (eg. every parent has 1-3 children).

  • (required) real_data: A dictionary mapping the name of each table to a pandas.DataFrame containing the real data for that table. To skip plotting the real data, input None.

  • (required) synthetic_data: A dictionary mapping the name of each table to a pandas.DataFrame containing the synthetic data for that table. To skip plotting the synthetic data, input None.

  • (required) parent_table_name: The string name of the parent table in the relationship

  • (required) child_table_name: The string name of the child table in the relationship

  • (required) parent_primary_key: The string name of the parent table's primary key

  • (required) child_foreign_key: The string name of the column in the child table that refers to the parent's primary key

  • plot_type: The type of plot to create

    • (default) None: Determine the type of plot to create based on the data.

    • 'distplot': Plot the data as a smooth, continuous distribution

    • 'bar': Plot the data as discrete bars

Returns: A plotly.Figure object

from sdmetrics.visualization import get_cardinality_plot

fig = get_cardinality_plot(
    real_data=real_tables,
    synthetic_data=synthetic_tables,
    parent_table_name='users',
    child_table_name='sessions',
    parent_primary_key='user_id'
    child_foreign_key='user_id',
    plot_type='bar'
)

fig.show()

Last updated