Single Table Metadata API
Last updated
Last updated
This guide will walk you through creating the metadata using the Python API.
Get started by creating a blank SingleTableMetadata
Automatically detect the metadata based on your actual data. Different methods are available based on the format of your data.
detect_from_dataframe
: Use this function to automatically detect metadata from your data that is available in a pandas.DataFrame object
Parameters
(required) data
: A pandas.DataFrame containing your real data
Output (None)
The detected metadata is not guaranteed to be accurate or complete. Be sure to carefully inspect the metadata and update information.
Primary keys and other identifiers are auto-detected, but may be incorrect or incomplete. See and method to add them.
Sensitive information may not be auto-detected. Check for columns with an 'unknown'
sdtype and use the method to update them.
At any point, you can inspect the current state of the metadata.
Use this to get a copy of the Python dictionary that corresponds to the metadata.
Parameters (None)
Output A Python dictionary that corresponds to the metadata
Use this to this to see a visual representation of the metadata. Use the parameters to control the level of details in the visualization and for saving the image.
Parameters
show_table_details
: Toggle the display of column details
(default) 'full'
Show all the different column names, primary keys and foreign keys
'summarized'
Summarize the columns based on the data type
output_filepath
: If provided, save the image at the given location in the given format
The output_filepath
must end with the filetype that you want to save as. Popular examples are png
, jpg
or pdf
.
Use this function to look up column names based on the metadata properties that they have.
Parameters
Output A list of strings, with the column names that match the criteria. If no columns match the criteria, then an empty string will be returned.
Use this to validate that the metadata is written according to the specification. This function will throw descriptive errors if there is anything wrong with the metadata.
Parameters (None)
Output (None)
Use this method to validate that the metadata accurately describes a particular dataset. This function will throw descriptive errors if there is any mismatch between the metadata and data.
Parameters:
(required) data
: A pandas.DataFrame containing data. The data should have the same columns as described in the metadata.
Output (None)
It is important to verify and update any inaccuracies in the metadata.
Use this method to modify the information about a column in your metadata.
Parameters
(required) column_name
: The name of the column to update
Output (None)
Use this function to make a bulk update to multiple columns at once. This function will allow you to set the same parameters for a group of columns.
Parameters
(required) column_names
: A list of strings representing the column names to update
<other properties>
: Based on the sdtype, provide other parameters
Output (None)
Use this function to make a bulk update to multiple columns at once. This function will allow you to set the different parameters for each column
Parameters
Output (None)
Use this function to add a column to your SingleTableMetadata object.
Parameters
(required) column_name
: The name of the column to update
**kwargs
: Any other parameters you need that describe metadata for a column.
Use this function to specify when a group of columns represents the same concept.
Parameters
(required) relationship_type
: A string with the type of relationship. This represents a higher level concept. See the tabs below for options.
(required) column_names
: A list of column names that are part of that relationship. Make sure that these columns are compatible with the relationship type. See the tabs below for more information.
An address is defined by 2 or more columns that have the following sdtypes: country_code
, administrative_unit
, state
, state_abbr
, city
, postcode
, street_address
and secondary_address
.
Output (None)
Use this function to set the primary key of the table. Any existing primary keys will be removed.
Parameters
(required) column_name
: The column name of the primary key. The column name must already be defined in the metadata and it must be an ID or another PII sdtype.
Output (None)
Use this function to remove any existing primary keys in the table.
Parameters (None)
Output (None)
Use this function to set alternate keys of the table. This method will add to any existing alternate keys you may have.
Parameters
(required) column_names
: A list of column names that represent the alternate keys in the table. All column names must already be defined in the metadata and they must be IDs or another PII sdtype.
Output (None)
You can save the metadata object as a JSON file and load it again for future use.
Use this to save the metadata object to a new JSON file that will be compatible with SDV 1.0 and beyond. We recommend you write the metadata to a new file every time you update it.
Parameters
(required) filepath
: The location of the file that will be created with the JSON metadata
Output (None)
If you already have a metadata JSON file, you can load it in as a SingleTableMetadata
object. Use the method based on the version of your JSON file.
load_from_json
: If you recently wrote your JSON file for SDV, use this class method to load it as a SingleTableMetadata
object.
Parameters
(required) filepath
: The name of the file containing the JSON metadata
Output A SingleTableMetadata
object
You can also load the metadata from a Python dictionary with the information.
Use this class method to load a Python dictionary as a SingleTableMetadata
object.
Output A SingleTableMetadata object
Use this method to anonymize the column names of your metadata. This makes it easier to share your metadata, eg. for debugging purposes.
Parameters (None)
Output A new SingleTableMetadata object that represents the anonymized metadata
Output A
(required) sdtype
: A string describing the statistical data type.
Common types are 'boolean'
, 'categorical'
, 'datetime'
, 'numerical'
and 'id'
. But other types such as 'phone_number'
are also available (see ).
<other properties>
: Based on the sdtype, provide other parameters. For more information, see the .
(required) sdtype
: A string describing the statistical data type.
Common types are 'boolean'
, 'categorical'
, 'datetime'
, 'numerical'
and 'id'
. But other types such as 'phone_number'
are also available. For more information, see .
<other properties>
: Based on the sdtype, provide other parameters. For more information, see .
(required) sdtype
: A string describing the statistical data type.
Common types are 'boolean'
, 'categorical'
, 'datetime'
, 'numerical'
and 'id'
. But other types such as 'phone_number'
are also available (see ).
(required) column_metadata
: A dictionary mapping each column name you want to update to the metadata information for that column. For the exact format, see the .
(required) sdtype
: A string describing the statistical data type.
Common types are 'boolean'
, 'categorical'
, 'datetime'
, 'numerical'
and 'id'
. Other types such as 'phone_number'
are also available (see ).
While anyone can add column relationships to their data, SDV Enterprise users will see the highest quality data for the relationships. To learn more about the SDV Enterprise and its extra features, .
Do you have a request for a type of column relationship? Please describing your use case.
(required) metadata_dict
: A Python dictionary representation of the metadata. See for more details.
*This feature is only available for licensed, enterprise users. To learn more about the SDV Enterprise features and purchasing a license, .