Demo Data

The SDV library contains many different demo datasets that you can use to get started. Use the demo module to access these datasets.

The demo module accesses the SDV's public dataset repository. These methods require an internet connection.

get_available_demos

Use this method to get information about all the available demos in the SDV's public dataset repository.

Parameters

  • modality: Set this to the string 'multi_table' to see all the multi table demo datasets

Returns A pandas DataFrame object containing the name of the dataset, its size (in MB) and the number of tables it contains.

from sdv.datasets.demo import get_available_demos

get_available_demos(modality='multi_table')
dataset_name            size_MB        num_tables
Accidents_v1            172.3          3
airbnb-simplified       371.5          2
Atherosclerosis_v1      2.9            4                     
...                     ...            ...

download_demo

Use this method to download a demo dataset from the SDV's public dataset repository.

Parameters

  • (required) modality: Set this to the string 'multi_table' to access multi table demo data

  • (required) dataset_name: A string with the name of the demo dataset. You can use any of the dataset names from the get_available_demo method.

  • output_folder_name: A string with the name of a folder. If provided, this method will download the data and metadata into the folder, in addition to returning the data.

    • (default) None: Do not save the data into a folder. The data will still be returned so that you can use it in your Python script.

Output A tuple (data, metadata).

The data is a dictionary that maps each table name to a pandas DataFrame containing the demo data for that table. The metadata is a MultiTableMetadata object the describes the data.

from sdv.datasets.demo import download_demo

data, metadata = download_demo(
    modality='multi_table',
    dataset_name='fake_hotels'
)

guests_table = data['guests']
hotels_table = data['hotels']

Last updated

Copyright (c) 2023, DataCebo, Inc.