LogoLogo
GitHubSlackDataCebo
  • SDMetrics
  • Getting Started
    • Installation
    • Quickstart
    • Metadata
      • Single Table Metadata
      • Multi Table Metadata
      • Sequential Metadata
  • Reports
    • Quality Report
      • What's included?
      • Single Table API
      • Multi Table API
    • Diagnostic Report
      • What's included?
      • Single Table API
      • Multi Table API
    • Other Reports
    • Visualization Utilities
  • Metrics
    • Diagnostic Metrics
      • BoundaryAdherence
      • CardinalityBoundaryAdherence
      • CategoryAdherence
      • KeyUniqueness
      • ReferentialIntegrity
      • TableStructure
    • Quality Metrics
      • CardinalityShapeSimilarity
      • CategoryCoverage
      • ContingencySimilarity
      • CorrelationSimilarity
      • KSComplement
      • MissingValueSimilarity
      • RangeCoverage
      • SequenceLengthSimilarity
      • StatisticMSAS
      • StatisticSimilarity
      • TVComplement
    • Privacy Metrics
      • DCRBaselineProtection
      • DCROverfittingProtection
      • DisclosureProtection
      • DisclosureProtectionEstimate
      • CategoricalCAP
    • ML Augmentation Metrics
      • BinaryClassifierPrecisionEfficacy
      • BinaryClassifierRecallEfficacy
    • Metrics in Beta
      • CSTest
      • Data Likelihood
        • BNLikelihood
        • BNLogLikelihood
        • GMLikelihood
      • Detection: Sequential
      • Detection: Single Table
      • InterRowMSAS
      • ML Efficacy: Sequential
      • ML Efficacy: Single Table
        • Binary Classification
        • Multiclass Classification
        • Regression
      • NewRowSynthesis
      • * OutlierCoverage
      • Privacy Against Inference
      • * SmoothnessSimilarity
  • Resources
    • Citation
    • Contributions
      • Defining your metric
      • Development
      • Release FAQs
    • Enterprise
      • Domain Specific Reports
    • Blog
Powered by GitBook
On this page
  • Compare a synthetic column & real column (1D)
  • Compare a pair of synthetic columns & real columns (2D)
  • Visualize the cardinality of a relationship
  1. Reports

Visualization Utilities

PreviousOther ReportsNextDiagnostic Metrics

Last updated 3 months ago

Use the utilities below to visualize the comparison between real and synthetic data. You can access these from the sdmetrics.visualization module.

Tip! All visualizations are interactive. If you're using an iPython notebook, you can zoom, pan, toggle legends and take screenshots.

Compare a synthetic column & real column (1D)

get_column_plot

Use this utility to visualize a real column against the same synthetic column. You can plot any column of type: boolean, categorical, datetime or numerical.

  • (required) real_data: A containing the table of your real data. To skip plotting the real data, input None.

  • (required) synthetic_data: A containing the synthetic data. To skip plotting the synthetic data, input None.

  • (required) column_name: The name of the column you want to plot.

  • plot_type: The type of plot to create

    • (default) None: Determine the type of plot to create based on the data.

    • 'distplot': Plot the data as a smooth, continuous distribution. Use this for continuous columns.

    • 'bar': Plot the data as discrete bars. Use this for discrete columns.

Returns: A object

from sdmetrics.visualization import get_column_plot

fig = get_column_plot(
    real_data=real_table,
    synthetic_data=synthetic_table,
    column_name='high_perc',
    plot_type='distplot'
)

fig.show()

Compare a pair of synthetic columns & real columns (2D)

utils.get_column_pair_plot

Use this utility to visualize the trends between a pair of columns for real and synthetic data. You can plot any 2 columns of type: boolean, categorical, datetime or numerical. The columns do not have to the be the same type.

  • (required) column_names: A list containing the names of the 2 columns you want to plot.

  • plot_type: The type of plot to create

    • (default) None: Determine the type of plot to create based on the data.

    • 'scatter': Plot each data point in 2D space as a scatter plot. Use this to compare a pair of continuous columns.

    • 'box': Plot the data as one or more box plot. Use this to compare a continuous column with a discrete column.

    • 'heatmap': Plot a side-by-side headmap of the data's categories. Use this to compare a pair of discrete columns.

from sdmetrics.visualization import get_column_pair_plot

fig = get_column_pair_plot(
    real_data=real_table,
    synthetic_data=synthetic_table,
    column_names=['mba_perc', 'degree_perc'],
    plot_type='scatter'
    
)

fig.show()

Various types of plots are possible based on the types of data you provide

Visualize the cardinality of a relationship

utils.get_cardinality_plot

Use this utility to visualize the cardinality of parent-child relationship. The cardinality is the # of children that each parent row has. Your cardinality may be fixed (eg. every parent has exactly 2 children) or variable (eg. every parent has 1-3 children).

  • (required) parent_table_name: The string name of the parent table in the relationship

  • (required) child_table_name: The string name of the child table in the relationship

  • (required) parent_primary_key: The string name of the parent table's primary key

  • (required) child_foreign_key: The string name of the column in the child table that refers to the parent's primary key

  • plot_type: The type of plot to create

    • (default) None: Determine the type of plot to create based on the data.

    • 'distplot': Plot the data as a smooth, continuous distribution

    • 'bar': Plot the data as discrete bars

from sdmetrics.visualization import get_cardinality_plot

fig = get_cardinality_plot(
    real_data=real_tables,
    synthetic_data=synthetic_tables,
    parent_table_name='users',
    child_table_name='sessions',
    parent_primary_key='user_id'
    child_foreign_key='user_id',
    plot_type='bar'
)

fig.show()

(required) real_data: A containing the table of your real data. To skip plotting the real data, input None.

(required) synthetic_data: A containing the synthetic data. To skip plotting the synthetic data, input None.

Returns: A object

(required) real_data: A dictionary mapping the name of each table to a containing the real data for that table. To skip plotting the real data, input None.

(required) synthetic_data: A dictionary mapping the name of each table to a containing the synthetic data for that table. To skip plotting the synthetic data, input None.

Returns: A object

pandas.DataFrame
pandas.DataFrame
plotly.Figure
pandas.DataFrame
pandas.DataFrame
plotly.Figure
pandas.DataFrame
pandas.DataFrame
plotly.Figure