Links

Single Table Metadata JSON

This guide describes the single table metadata JSON spec.
An example of single table data
Click to see the metadata JSON file
This is an example of a JSON file describing a single table.
{
"primary_key": "guest_email",
"alternate_keys": [ "credit_card_number" ],
"METADATA_SPEC_VERSION": "SINGLE_TABLE_V1",
"columns": {
"guest_email": { "sdtype": "email", "pii": true },
"has_rewards": { "sdtype": "boolean" },
"room_type": { "sdtype": "categorical" },
"amenities_fee": { "sdtype": "numerical" },
"checkin_date": { "sdtype": "datetime", "datetime_format": "%d %b %Y" },
"checkout_date": { "sdtype": "datetime", "datetime_format": "%d %b %Y" },
"room_rate": { "sdtype": "numerical" },
"billing_address": { "sdtype": "address", "pii": true },
"credit_card_number": { "sdtype": "credit_card_number", "pii": true }
}
}
Create your metadata programmatically. Use the Python API to automatically detect the metadata based on your data.

Overview

The metadata for a single table contains the following elements:
  • (required) "METADATA_SPEC_VERSION": The version of the metadata. If you are using this, the metadata version will be "SINGLE_TABLE_V1", indicating that it is a single table that is compatible with SDV version 1.
  • (required) "columns": A dictionary that maps the column names to the data types they represent and any other attributes.
  • "primary_key": The column name that is the primary key in the table
  • "alternate_keys": A list of column names that can act as alternate keys in the table
If your table includes sequential data, other keys are available to describe the sequences. See Sequential Metadata for more details.

Columns

When describing a column, you will provide the column name and the data type, known as the sdtype.
The 5 common sdtypes are: "numerical", "datetime", "categorical", "boolean" and "text". Click on the type below to learn more about the type and how to specify it in the metadata.
boolean
categorical
datetime
numerical
id
other
Boolean columns represent True or False values.
"has_rewards" : {
"sdtype": "boolean"
}
Properties (None)
Categorical columns represent discrete data
"room_type" : {
"sdtype": "categorical"
}
Properties (None)
Date columns represent a point in time
"checkin_date": {
"sdtype": "datetime",
"datetime_format": "%d %b %Y"
}
Properties
Numerical columns represents discrete or continuous numerical values.
"room_rate": {
"sdtype": "numerical",
"computer_representation": "Float"
}
Properties
  • computer_representation: A string that represents how you'll ultimately store the data. This determines the min and max values allowed Available options are: 'Float', 'Int8', 'Int16', 'Int32', 'Int64', 'UInt8', 'UInt16', 'UInt32', 'UInt64'
ID columns represent identifiers that do not have any special mathematical or semantic meaning
"user_id": {
"sdtype": "id",
"regex_format": "U_[0-9]{3}"
}
Properties
You can input any other data type such as 'phone_number', 'ssn' or 'email'. See the Sdtypes Reference for a full list.
"billing_address": {
"sdtype": "address",
"pii": true
}
Properties
  • pii: A boolean denoting whether the data is sensitive
    • (default) true: The column is sensitive, meaning the values should be anonymized
    • false: The column is not sensitive, meaning the exact set of values can be reused in the synthetic data
Copyright (c) 2023, DataCebo, Inc.