LogoLogo
GitHubSlackDataCebo
  • SDMetrics
  • Getting Started
    • Installation
    • Quickstart
    • Metadata
      • Single Table Metadata
      • Multi Table Metadata
      • Sequential Metadata
  • Reports
    • Quality Report
      • What's included?
      • Single Table API
      • Multi Table API
    • Diagnostic Report
      • What's included?
      • Single Table API
      • Multi Table API
    • Other Reports
    • Visualization Utilities
  • Metrics
    • Diagnostic Metrics
      • BoundaryAdherence
      • CardinalityBoundaryAdherence
      • CategoryAdherence
      • KeyUniqueness
      • ReferentialIntegrity
      • TableStructure
    • Quality Metrics
      • CardinalityShapeSimilarity
      • CategoryCoverage
      • ContingencySimilarity
      • CorrelationSimilarity
      • KSComplement
      • MissingValueSimilarity
      • RangeCoverage
      • SequenceLengthSimilarity
      • StatisticMSAS
      • StatisticSimilarity
      • TVComplement
    • Privacy Metrics
      • DCRBaselineProtection
      • DCROverfittingProtection
      • DisclosureProtection
      • DisclosureProtectionEstimate
      • CategoricalCAP
    • ML Augmentation Metrics
      • BinaryClassifierPrecisionEfficacy
      • BinaryClassifierRecallEfficacy
    • Metrics in Beta
      • CSTest
      • Data Likelihood
        • BNLikelihood
        • BNLogLikelihood
        • GMLikelihood
      • Detection: Sequential
      • Detection: Single Table
      • InterRowMSAS
      • ML Efficacy: Sequential
      • ML Efficacy: Single Table
        • Binary Classification
        • Multiclass Classification
        • Regression
      • NewRowSynthesis
      • * OutlierCoverage
      • Privacy Against Inference
      • * SmoothnessSimilarity
  • Resources
    • Citation
    • Contributions
      • Defining your metric
      • Development
      • Release FAQs
    • Enterprise
      • Domain Specific Reports
    • Blog
Powered by GitBook
On this page
  • Metadata Specification
  • Column Information
  • Saving & Loading Metadata
  1. Getting Started
  2. Metadata

Single Table Metadata

PreviousMetadataNextMulti Table Metadata

Last updated 1 year ago

Use this guide to write a description for a single data table. In a single table, all your data is captured in a 2D format using rows and columns.

Your data description is called metadata. SDMetrics expects metadata as a Python dictionary object.

Click to see the table's metadata

This is the metadata dictionary for the illustrated table

{
    "primary_key": "user_id",
    "columns": {
        "user_id": {
            "sdtype": "id",
            "regex_format": "U_[0-9]{3}"
        },
        "age": {
            "sdtype": "numerical"
        },
        "address": {
            "sdtype": "address",
            "pii": True
        }, 
        "tier": {
            "sdtype": "categorical"
        },
        "active": {
            "sdtype": "boolean"
        },
        "paid_amt": {
            "sdtype": "numerical"
        },
        "renew_date": {
            "sdtype": "datetime",
            "datetime_format": "%Y-%m-%d"
        }
    }
}

Metadata Specification

The metadata has two keys:

  • "primary_key": the column name used to identify a row in your table

  • (required) "columns": a dictionary description of each column

{
    "primary_key": "user_id",
    "columns": { <column information> }   
}

Column Information

Inside "columns", you will describe each column. You'll start with the name of the column. Then you'll specify the type of data and any other information about it. There are specific data types to choose from. Expand the options below to learn about the data types.

Boolean columns represent True or False values.

"active": { 
    "sdtype": "boolean"
}

Properties (None)

Categorical columns describe discrete data.

"tier": {
    "sdtype": "categorical",
}

Properties (None)

Date columns represent a point in time

"renew_date": {
    "sdtype": "datetime",
    "format": "%Y-%m-%d"
}

Properties

The format string has special values to describe the components. For example, Jan 06, 2022 is represented as "%b %d, %Y". Common values are:

  • Year: "%Y" for a 4-digit year like 2022, or "%y" for a 2-digit year like 22

  • Month: "%m" for a 2-digit month like 01, "%b" for an abbreviated month like Jan

  • Day: "%d" for a 2-digit day like 06

Numerical columns represents discrete or continuous numerical values.

"age": {
    "sdtype": "numerical"
},
"paid_amt": {
    "sdtype": "numerical",
    "compute_representation": "Float"
}

Properties

  • computer_representation: A string that represents how you'll ultimately store the data. This determines the min and max values allowed Available options are: 'Float', 'Int8', 'Int16', 'Int32', 'Int64', 'UInt8', 'UInt16', 'UInt32', 'UInt64'

ID columns represent identifiers that do not have any special mathematical or semantic meaning

"user_id": { 
    "sdtype": "id",
    "regex_format": "U_[0-9]{3}"
}

Properties

"address": {
    "sdtype": "address",
    "pii": True
}

Properties

  • pii: A boolean denoting whether the data is sensitive

    • (default) True: The column is sensitive, meaning the synthetic data is anonymized

    • False: The column is not sensitive, meaning the synthetic data may not be anonymized

Saving & Loading Metadata

After creating your dictionary, you can save it as a JSON file. For example, my_metadata_file.json.

import json

with open('my_metadata_file.json', 'w') as f:
    json.dump(my_metadata_dict, f)

In the future, you can load the Python dictionary by reading from the file.

import json 

with open('my_metadata_file.json') as f:
    my_metadata_dict = json.load(f)

# use my_metadata_dict in the SDMetrics library

(required) datime_format: A string describing the format as defined by .

regex_format: A string describing the format of the ID as a

You can input any other data type such as 'phone_number', 'ssn' or 'email'. See the for a full list.

Python's strftime module
regular expression
Sdtypes Reference
This example of a single table includes a new row for each user. The row includes their personal information.