# Visualization Utilities

Use the utilities below to visualize the comparison between real and synthetic data. You can access these from the `sdmetrics.visualization` module.&#x20;

{% hint style="success" %}
**Tip!** All visualizations are interactive. If you're using an iPython notebook, you can zoom, pan, toggle legends and take screenshots.
{% endhint %}

### Compare a synthetic column & real column (1D)

**get\_column\_plot**

Use this utility to visualize a real column against the same synthetic column. You can plot any column of type: `boolean`, `categorical`, `datetime` or `numerical`.&#x20;

* (required) `real_data`: A [pandas.DataFrame](https://pandas.pydata.org/docs/reference/api/pandas.DataFrame.html) containing the table of your real data. *To skip plotting the real data, input `None`.*&#x20;
* (required) `synthetic_data`: A [pandas.DataFrame](https://pandas.pydata.org/docs/reference/api/pandas.DataFrame.html) containing the synthetic data. *To skip plotting the synthetic data, input `None`.*&#x20;
* (required) `column_name`: The name of the column you want to plot.
* `plot_type`: The type of plot to create
  * (default) `None`: Determine the type of plot to create based on the data.
  * `'distplot'`: Plot the data as a smooth, continuous distribution. Use this for continuous columns.
  * `'bar'`: Plot the data as discrete bars. Use this for discrete columns.

Returns: A [plotly.Figure](https://plotly.com/python-api-reference/generated/plotly.graph_objects.Figure.html) object

```python
from sdmetrics.visualization import get_column_plot

fig = get_column_plot(
    real_data=real_table,
    synthetic_data=synthetic_table,
    column_name='high_perc',
    plot_type='distplot'
)

fig.show()
```

<figure><img src="/files/nmSB7DWegcjfd7upqTvJ" alt=""><figcaption></figcaption></figure>

### Compare a pair of synthetic columns & real columns (2D)

**utils.get\_column\_pair\_plot**

Use this utility to visualize the trends between a pair of columns for real and synthetic data. You can plot any 2 columns of type: `boolean`, `categorical`, `datetime` or `numerical`. The columns do not have to the be the same type.

* (required) `real_data`: A [pandas.DataFrame](https://pandas.pydata.org/docs/reference/api/pandas.DataFrame.html) containing the table of your real data. *To skip plotting the real data, input `None`.*&#x20;
* (required) `synthetic_data`: A [pandas.DataFrame](https://pandas.pydata.org/docs/reference/api/pandas.DataFrame.html) containing the synthetic data. *To skip plotting the synthetic data, input `None`.*&#x20;
* (required) `column_names`: A list containing the names of the 2 columns you want to plot.&#x20;
* `plot_type`: The type of plot to create
  * (default) `None`: Determine the type of plot to create based on the data.
  * `'scatter'`: Plot each data point in 2D space as a scatter plot. Use this to compare a pair of continuous columns.
  * `'box'`: Plot the data as one or more box plot. Use this to compare a continuous column with a discrete column.
  * `'violin'`: Create a violin plot to show distribution of a continuous column broken down by a discrete column.  This is an alternative to using `'box'` .
  * `'heatmap'`: Plot a side-by-side headmap of the data's categories. Use this to compare a pair of discrete columns.

Returns: A [plotly.Figure](https://plotly.com/python-api-reference/generated/plotly.graph_objects.Figure.html) object

```python
from sdmetrics.visualization import get_column_pair_plot

fig = get_column_pair_plot(
    real_data=real_table,
    synthetic_data=synthetic_table,
    column_names=['mba_perc', 'degree_perc'],
    plot_type='scatter'
    
)

fig.show()
```

Various types of plots are possible based on the types of data you provide

<figure><img src="/files/ZsxcJVCUnQAJ5hBgsO1s" alt=""><figcaption></figcaption></figure>

### Visualize the cardinality of a relationship

**utils.get\_cardinality\_plot**

Use this utility to visualize the cardinality of parent-child relationship. The cardinality is the # of children that each parent row has. Your cardinality may be fixed (eg. every parent has exactly 2 children) or variable (eg. every parent has 1-3 children).

* (required) `real_data`:  A dictionary mapping the name of each table to a [pandas.DataFrame](https://pandas.pydata.org/docs/reference/api/pandas.DataFrame.html) containing the real data for that table. *To skip plotting the real data, input `None`.*&#x20;
* (required) `synthetic_data`: A dictionary mapping the name of each table to a [pandas.DataFrame](https://pandas.pydata.org/docs/reference/api/pandas.DataFrame.html) containing the synthetic data for that table. *To skip plotting the synthetic data, input `None`.*&#x20;
* (required) `parent_table_name`: The string name of the parent table in the relationship
* (required) `child_table_name`: The string name of the child table in the relationship
* (required) `parent_primary_key`: The string name of the parent table's primary key
* (required) `child_foreign_key`: The string name of the column in the child table that refers to the parent's primary key
* `plot_type`: The type of plot to create
  * (default) `None`: Determine the type of plot to create based on the data.
  * `'distplot'`: Plot the data as a smooth, continuous distribution
  * `'bar'`: Plot the data as discrete bars

Returns: A [plotly.Figure](https://plotly.com/python-api-reference/generated/plotly.graph_objects.Figure.html) object

```python
from sdmetrics.visualization import get_cardinality_plot

fig = get_cardinality_plot(
    real_data=real_tables,
    synthetic_data=synthetic_tables,
    parent_table_name='users',
    child_table_name='sessions',
    parent_primary_key='user_id',
    child_foreign_key='user_id',
    plot_type='bar'
)

fig.show()
```

<figure><img src="/files/KkD2RJNgIDfvPgq4wZ7J" alt=""><figcaption></figcaption></figure>


---

# Agent Instructions: Querying This Documentation

If you need additional information that is not directly available in this page, you can query the documentation dynamically by asking a question.

Perform an HTTP GET request on the current page URL with the `ask` query parameter:

```
GET https://docs.sdv.dev/sdmetrics/getting-started/visualization-utilities.md?ask=<question>
```

The question should be specific, self-contained, and written in natural language.
The response will contain a direct answer to the question and relevant excerpts and sources from the documentation.

Use this mechanism when the answer is not explicitly present in the current page, you need clarification or additional context, or you want to retrieve related documentation sections.
