Demo Data

The SDV library contains many different demo datasets that you can use to get started. Use the demo module to access these datasets.

get_available_demos

Use this method to get information about all the available demos in the SDV's public dataset repository.

Parameters

  • modality: Set this to the string 'multi_table' to see all the multi table demo datasets

Returns A pandas DataFrame object containing the name of the dataset, its size (in MB) and the number of tables it contains.

from sdv.datasets.demo import get_available_demos

get_available_demos(modality='multi_table')
dataset_name            size_MB        num_tables
Accidents_v1            172.3          3
airbnb-simplified       371.5          2
Atherosclerosis_v1      2.9            4                     
...                     ...            ...

download_demo

Use this method to download a demo dataset from the SDV's public dataset repository.

Parameters

  • (required) modality: Set this to the string 'multi_table' to access multi table demo data

  • (required) dataset_name: A string with the name of the demo dataset. You can use any of the dataset names from the get_available_demo method.

  • output_folder_name: A string with the name of a folder. If provided, this method will download the data and metadata into the folder, in addition to returning the data.

    • (default) None: Do not save the data into a folder. The data will still be returned so that you can use it in your Python script.

Output A tuple (data, metadata).

The data is a dictionary that maps each table name to a pandas DataFrame containing the demo data for that table. The metadata is a Metadata object the describes the data.

from sdv.datasets.demo import download_demo

data, metadata = download_demo(
    modality='multi_table',
    dataset_name='fake_hotels'
)

guests_table = data['guests']
hotels_table = data['hotels']

Last updated