Metadata
Last updated
Last updated
This guide describes the SDGym's metadata specification for a single table of data.
Metadata is a basic, factual description of a dataset that includes:
The type of data that each column represents
The primary keys and other identifiers of the table
The SDGym library expects that every dataset will have corresponding metadata provided as a JSON file. During benchmarking, the SDGym reads the file as a Python dictionary.
We assume that the data is present in a CSV format that describes rows and columns of a single table.
The metadata for a single table contains the following elements:
(required) "METADATA_SPEC_VERSION"
: The version of the metadata. If you are using this, the metadata version will be "V1"
, indicating that it is a multi table dataset that is compatible with SDV version 1.
(required) "tables"
: A dictionary that maps the table names to the table-specific metadata such as primary keys, column names and data types. Note that SDGym only works with single-table schemas.
The tables dictionary maps each table name to the table-specific metadata. Because SDGym only works with single-table schemas, the table name does not matter. But please be sure that the table-specific metadata matches your data.
(required) "columns"
: A dictionary that maps the column names to the data types they represent and any other attributes.
"primary_key"
: The column name that is the primary key in the table
"alternate_keys"
: A list of column names that can act as alternate keys in the table
When describing a column, you will provide the column name and the data type, known as the sdtype.
The 5 common sdtypes are: "numerical"
, "datetime"
, "categorical"
, "boolean"
and "id"
. Click on the type below to learn more about the type and how to specify it in the metadata.
Each table in the metadata has two keys:
"primary_key"
: the column name used to identify a row in the table
(required) "columns"
: a dictionary description of each column
Boolean columns represent True or False values.
"active" : {
"sdtype": "boolean"
}
Properties (None)