Multi Table Metadata
Use this guide to write a description for multi table data. You have multi table data if your data is present in multiple tables that have rows and columns. Usually the tables are connected to each other through primary and foreign key references.

This example of a Multi Table dataset has a table for users and a table for their sessions. Each user can have multiple sessions recorded.
Your data description is called metadata. SDMetrics expects metadata as a Python dictionary object.
This is the metadata dictionary for the illustrated table
{
"tables": {
"users": {
"primary_key": "user_id",
"columns": {
"user_id": {
"sdtype": "id",
"regex_format": "U_[0-9]{3}"
},
"age": {
"sdtype": "numerical"
},
"address": {
"sdtype": "address",
"pii": True
}
}
},
"sessions": {
"primary_key": "session_id",
"columns": {
"session_id": {
"sdtype": "id"
},
"user": {
"sdtype": "id",
"regex_format": "U_[0-9]{3}"
},
"date": {
"sdtype": "datetime",
"datetime_format": "%Y-%m-%d"
},
"browser": {
"sdtype": "categorical"
},
"bounced": {
"sdtype": "boolean"
}
}
}
},
"relationships": [{
"parent_table_name": "users",
"parent_primary_key": "user_id",
"child_table_name": "sessions",
"child_foreign_key": "user_id"
]}
}
The file is an object that includes a dictionary named
"tables"
.{
"tables": {
<tables information>
},
}
The
"tables"
dictionary contains the information about each individual table of your application. Its keys are the table names and the values are dictionaries that describe each single table. This includes:"primary_key"
: the column name used to identify a row in your table- (required)
"columns"
: a dictionary description of each column
{
"tables": {
"users": {
"primary_key": "user_id",
"columns": { <column information> }
},
"sessions": {
"primary_key": "session_id",
"columns": { <column information> }
}
},
...
}
Inside
"columns"
, you will describe each column. You'll start with the name of the column. Then you'll specify the type of data and any other information about it.There are specific data types to choose from. Expand the options below to learn about the data types.
boolean
categorical
datetime
numerical
id
other
Boolean columns represent True or False values.
"active": {
"sdtype": "boolean"
}
Properties (None)
Categorical columns describe discrete data.
"tier": {
"sdtype": "categorical",
}
Properties (None)
Date columns represent a point in time
"renew_date": {
"sdtype": "datetime",
"format": "%Y-%m-%d"
}
Properties
The format string has special values to describe the components. For example,
Jan 06, 2022
is represented as "%b %d, %Y".
Common values are:- Year:
"%Y"
for a 4-digit year like 2022, or"%y"
for a 2-digit year like 22 - Month:
"%m"
for a 2-digit month like 01,"%b"
for an abbreviated month like Jan - Day:
"%d"
for a 2-digit day like 06
Numerical columns represents discrete or continuous numerical values.
"age": {
"sdtype": "numerical"
},
"paid_amt": {
"sdtype": "numerical",
"compute_representation": "Float"
}
Properties
computer_representation
: A string that represents how you'll ultimately store the data. This determines the min and max values allowed Available options are:'Float'
,'Int8'
,'Int16'
,'Int32'
,'Int64'
,'UInt8'
,'UInt16'
,'UInt32'
,'UInt64'
ID columns represent identifiers that do not have any special mathematical or semantic meaning
"user_id": {
"sdtype": "id",
"regex_format": "U_[0-9]{3}"
}
Properties
You can input any other data type such as
'phone_number'
, 'ssn'
or 'email'
. See the Sdtypes Reference for a full list."address": {
"sdtype": "address",
"pii": True
}
Properties
pii
: A boolean denoting whether the data is sensitive- (default)
True
: The column is sensitive, meaning the synthetic data is anonymized False
: The column is not sensitive, meaning the synthetic data may not be anonymized
After creating your dictionary, you can save it as a JSON file. For example,
my_metadata_file.json
.import json
with open('my_metadata_file.json', 'w') as f:
json.dump(my_metadata_dict, f)
In the future, you can load the Python dictionary by reading from the file.
import json
with open('my_metadata_file.json') as f:
my_metadata_dict = json.load(f)
# use my_metadata_dict in the SDMetrics library
Last modified 7mo ago