Multi Table Metadata

Use this guide to write a description for multi table data. You have multi table data if your data is present in multiple tables that have rows and columns. Usually the tables are connected to each other through primary and foreign key references.

This example of a Multi Table dataset has a table for users and a table for their sessions. Each user can have multiple sessions recorded.

Your data description is called metadata. SDMetrics expects metadata as a Python dictionary object.

chevron-rightClick to see the metadatahashtag

This is the metadata dictionary for the illustrated table

{
    "tables": {
        "users": {
            "primary_key": "user_id",
            "columns": {
                "user_id": {
                    "sdtype": "id",
                    "regex_format": "U_[0-9]{3}"
                },
                "age": {
                    "sdtype": "numerical"
                },
                "address": {
                    "sdtype": "address",
                    "pii": True
                }
            }
        },
        "sessions": {
            "primary_key": "session_id",
            "columns": {
                "session_id": {
                    "sdtype": "id"
                },
                "user": {
                    "sdtype": "id",
                    "regex_format": "U_[0-9]{3}"
                },
                "date": {
                    "sdtype": "datetime",
                    "datetime_format": "%Y-%m-%d"
                },
                "browser": {
                    "sdtype": "categorical"
                },
                "bounced": {
                    "sdtype": "boolean"
                }
            }
        }
    },
    "relationships": [{
        "parent_table_name": "users",
        "parent_primary_key": "user_id",
        "child_table_name": "sessions",
        "child_foreign_key": "user_id"
    ]}
}

Metadata Specification

The file is an object that includes a dictionary named "tables".

Tables

The "tables" dictionary contains the information about each individual table of your application. Its keys are the table names and the values are dictionaries that describe each single table. This includes:

  • "primary_key": the column name used to identify a row in your table. For a composite key, provide a list of column names.

  • (required) "columns": a dictionary description of each column

Column Information

Inside "columns", you will describe each column. You'll start with the name of the column. Then you'll specify the type of data and any other information about it.

There are specific data types to choose from. Expand the options below to learn about the data types.

Boolean columns represent True or False values.

Properties (None)

Relationships

Inside the "relationships" section of the metadata, provide a list of relationships. Each relationship is represented as a dictionary with the following keys:

  • "parent_table_name": The name of the parent table

  • "parent_primary_key": The primary key column in the parent table. This column uniquely identifies each row in the parent table. For a composite key, provide a list of column names.

  • "child_table_name": The name of the child table that refers to the parent

  • "child_foreign_key": The foreign key column in the child table. The values in this column contain a reference to a row in the parent table. For a composite key, provide a list of column names.

Use multiple dictionaries to represent multiple tables.

Saving & Loading Metadata

After creating your dictionary, you can save it as a JSON file. For example, my_metadata_file.json.

In the future, you can load the Python dictionary by reading from the file.

Last updated