Demo Data
The SDV library contains many different demo datasets that you can use to get started. Use the demo module to access these datasets.
The demo module accesses the SDV's public dataset repository. These methods require an internet connection.
get_available_demos
Use this method to get information about all the available demos in the SDV's public dataset repository.
Parameters
modality: Set this to the string'multi_table'to see all the multi table demo datasets
Returns A pandas DataFrame object containing the name of the dataset, its size (in MB) and the number of tables it contains.
from sdv.datasets.demo import get_available_demos
get_available_demos(modality='multi_table')dataset_name size_MB num_tables
Accidents_v1 172.3 3
airbnb-simplified 371.5 2
Atherosclerosis_v1 2.9 4
... ... ...download_demo
Use this method to download a demo dataset from the SDV's public dataset repository.
Parameters
(required)
modality: Set this to the string'multi_table'to access multi table demo data(required)
dataset_name: A string with the name of the demo dataset. You can use any of the dataset names from theget_available_demomethod.output_folder_name: A string with the name of a folder. If provided, this method will download the data and metadata into the folder, in addition to returning the data.(default)
None: Do not save the data into a folder. The data will still be returned so that you can use it in your Python script.
Output A tuple (data, metadata).
The data is a dictionary that maps each table name to a pandas DataFrame containing the demo data for that table. The metadata is a Metadata object the describes the data.
from sdv.datasets.demo import download_demo
data, metadata = download_demo(
modality='multi_table',
dataset_name='fake_hotels'
)
guests_table = data['guests']
hotels_table = data['hotels']get_source
Some datasets have a source file that describes where the dataset comes from. This can include information like a URL, citations for the original publication, and other information that tracks the dataset's provenance. Use this function to get all this source information.
Parameters
(required)
modality: Set this to the string'multi_table'to access multi table demo data(required)
dataset_name: A string with the name of the demo dataset. You can use any of the dataset names from theget_available_demomethod.output_filepath: A string with the name of a file path. If provided, this method will create the file and write the source information to the file, in addition to returning it.(default)
None: Do not save the source information into a file. The contents will still be returned so that you can print it out and read it.
Output A string containing the contents of the source information. You can print it out to read it. (If no source information is available for a dataset, the function returns None and no file will be written.)
from sdv.datasets.demo import get_source
source_text = get_source(
modality='multi_table',
dataset_name='Bupa',
output_filepath='Bupa_source.txt'
)
print(source_text)Source URL: https://archive.ics.uci.edu/dataset/60/liver+disorders
License name: Creative Commons Attribution 4.0 International
Citations:
[1] Liver Disorders [Dataset]. (2016). UCI Machine Learning Repository. https://doi.org/10.24432/C54G67.get_readme
Some datasets have a README file that describes more information about what the dataset means. This could include explanations for naming conventions used in the dataset, mappings for ID codes, or business logic. Use this function to get the README (if it exists).
Parameters
(required)
modality: Set this to the string'multi_table'to access multi table demo data(required)
dataset_name: A string with the name of the demo dataset. You can use any of the dataset names from theget_available_demomethod.output_filepath: A string with the name of a file path. If provided, this method will create the file and write the README information to the file, in addition to returning it.(default)
None: Do not save the README information into a file. The contents will still be returned so that you can print it out and read it.
Output A string containing the contents of the README information. You can print it out to read it. (If no README information is available for a dataset, the function returns None and no file will be written.)
Last updated