Creating Metadata

This guide will walk you through creating the metadata using the Python API.

Auto Detect Metadata

Once you have loaded your data into Python, you can auto-detect your actual data.

detect_from_dataframe

Use this function to automatically detect metadata from your data that you've loaded as a pandas.DataFrame object.

Parameters:

  • (required) data: Your pandas DataFrame object that contains the data

  • table_name: A string describing the name of your table. SDV will use the table name when referring to your table in the metadata, as well as any warnings or descriptive error messages.

    • (default) By default, we'll name your data table 'table'

  • infer_sdtypes: A boolean describing whether to infer the sdtypes of each column

    • (default) True: Infer the sdtypes of each column based on the data.

    • False: Do not infer the sdtypes. All columns will be marked as unknown, ready for you to manually update.

  • infer_keys: A string describing whether to infer the primary keys

    • (default) 'primary_only': Infer the primary keys in the table

    • None: Do not infer any primary keys. You can manually add these later.

Output A Metadata object that descibes the data

from sdv.metadata import Metadata

metadata = Metadata.detect_from_dataframe(
    data=my_dataframe,
    table_name='hotel_guests')

Updating Metadata

metadata.update_column(
    column_name='start_date',
    sdtype='datetime',
    datetime_format='%Y-%m-%d')
    
metadata.update_column(
    column_name='user_cell',
    sdtype='phone_number',
    pii=True)
    
metadata.validate()

Saving, Loading & Sharing Metadata

You can save the metadata object as a JSON file and load it again for future use.

save_to_json

Use this to save the metadata object to a new JSON file that will be compatible with SDV 1.0 and beyond. We recommend you write the metadata to a new file every time you update it.

Parameters

  • (required) filepath: The location of the file that will be created with the JSON metadata

  • mode: A string describing the mode to use when creating the JSON file

    • (default) 'write': Write the metadata to the file, raising an error if the file already exists

    • 'overwrite': Write the metadata to the file, replacing the contents if the file already exists

Output (None)

metadata.save_to_json(filepath='my_metadata_v1.json')

load_from_json

Use this method to load your file as a Metadata object.

Parameters

  • (required) filepath: The name of the file containing the JSON metadata

Output: A Metadata object.

metadata = Metadata.load_from_json(filepath='my_metadata_v1.json')

Last updated