Single Table Metadata JSON

This guide describes the single table metadata JSON spec.

Click to see the metadata JSON file

This is an example of a JSON file describing a single table.

{
    "primary_key": "guest_email",
    "alternate_keys": [ "credit_card_number" ],
    "METADATA_SPEC_VERSION": "SINGLE_TABLE_V1",
    "columns": {
        "guest_email": { "sdtype": "email", "pii": true },
        "has_rewards": { "sdtype": "boolean" },
        "room_type": { "sdtype": "categorical" },
        "amenities_fee": { "sdtype": "numerical" },
        "checkin_date": { "sdtype": "datetime", "datetime_format": "%d %b %Y" },
        "checkout_date": { "sdtype": "datetime", "datetime_format": "%d %b %Y" },
        "room_rate": { "sdtype": "numerical" },
        "billing_address": { "sdtype": "address", "pii": true },
        "credit_card_number": { "sdtype": "credit_card_number", "pii": true }
    },
    "column_relationships": []
}

Create your metadata programmatically. Use the Python API to automatically detect the metadata based on your data.

Overview

The metadata for a single table contains the following elements:

  • (required) "METADATA_SPEC_VERSION": The version of the metadata. If you are using this, the metadata version will be "SINGLE_TABLE_V1", indicating that it is a single table that is compatible with SDV version 1.

  • (required) "columns": A dictionary that maps the column names to the data types they represent and any other attributes.

  • "primary_key": The column name that is the primary key in the table

  • "alternate_keys": A list of column names that can act as alternate keys in the table

If your table includes sequential data, other keys are available to describe the sequences. See Sequential Metadata for more details.

Columns

When describing a column, you will provide the column name and the data type, known as the sdtype.

The 5 common sdtypes are: "numerical", "datetime", "categorical", "boolean" and "text". Click on the type below to learn more about the type and how to specify it in the metadata.

Boolean columns represent True or False values.

"has_rewards" : {
    "sdtype": "boolean"
}

Properties (None)

Column Relationships

Annotate groups of columns that represents higher level concepts. Denote the concept using the "type" keyword, followed by "column_names" with the list of columns involved. The column names can be present in any order.

Each relationship type supports different types of columns. Browse the table below to explore different options.

An address is defined by 2 or more columns that have the following sdtypes: country_code, administrative_unit, state, state_abbr, city, postcode, street_address and secondary_address.

{
    "type": "address",
    "column_names": ["addr_line1", "addr_line2", "city", "state", "zipcode"]
}

* While anyone can add column relationships to their data, SDV Enterprise users will see the highest quality data for the relationships. To learn more about the SDV Enterprise and its extra features, visit our website.

Last updated

Copyright (c) 2023, DataCebo, Inc.