Public SDV Datasets
The SDGym library includes a variety of public, demo datasets that you can use from benchmarking. These come from our main SDV library.
Available Datasets
View all the datasets that are available through the get_available_datasets
function.
get_available_datasets
See all the publicly available demo datasets that are available to use.
Parameters
modality
: A string describing the type of data. At this time, the only supported modality is'single_table'
.
Returns A pandas DataFrame object that describes the dataset name, dataset size and number of tables.
Recommended Datasets
By default, the benchmarking includes 9 of the available datasets. These datasets were chosen as examples of rich data that you may find in real world settings. They of substantial size, contain a variety of columns and meet the SDGym standards for single table data.
adult
alarm
census
child
Health properties corresponding to different patients
covtype
expedia_hotel_logs
insurance
intrusion
news
Attributes about published news articles
Benchmarking the datasets
You can benchmark any of the publicly available datasets by providing their string names into the sdv_datasets
parameter.
Last updated