Custom Datasets
Last updated
Last updated
This guide provides instructions for including your own, custom datasets into the SDGym benchmarking framework.
You can add any number of custom datasets that represent single table data.
Each dataset must have:
Data, stored as a single CSV file with a name ending in .csv
, stored as a JSON file named metadata.json
The SDGym is optimized for applying multiple custom datasets to the benchmarking framework. Please convert all custom dataset into the following file structure:
Compress the CSV and JSON file for each dataset into a single zip
file
Put all the zip files into a single folder for all your custom datasets
The overall structure is illustrated below.
To use your custom datasets, supply the path to the overall folder using the additional_datasets_folder
parameter.
If you have the datasets folder stored on your machine, provide the folder's path as a string.
A result for each dataset and synthesizer will be able after the benchmarking finishes. You can identify each dataset by the name of each zip file.
Your folder can be stored on computer locally or it can be an bucket.
If your datasets folder is an Amazon S3 bucket, you can provide the name of the bucket instead prefixed with 's3://'
. For more information, see the docs for .