Creating Metadata

This guide will walk you through creating the metadata using the Python API.

Auto Detect Metadata

Once you have loaded your data into Python, you can auto-detect your actual data.

detect_from_dataframe

Use this function to automatically detect metadata from your data that you've loaded as a pandas.DataFrame object.

Parameters:

(required) data: Your pandas DataFrame object that contains the data
table_name: A string describing the name of your table. SDV will use the table name when referring to your table in the metadata, as well as any warnings or descriptive error messages.
- (default) By default, we'll name your data table 'table'
infer_sdtypes: A boolean describing whether to infer the sdtypes of each column
- (default) True: Infer the sdtypes of each column based on the data.
- False: Do not infer the sdtypes. All columns will be marked as unknown, ready for you to manually update.
infer_keys: A string describing whether to infer the primary keys
- (default) 'primary_only': Infer the primary keys in the table
- None: Do not infer any primary keys. You can manually add these later.

Output A Metadata object that descibes the data

from sdv.metadata import Metadata

metadata = Metadata.detect_from_dataframe(
    data=my_dataframe,
    table_name='hotel_guests')

Updating Metadata

The detected metadata is not guaranteed to be accurate or complete. Be sure to carefully inspect the metadata and update it so it accurately represents your data.

For more information about inspecting and updating your metadata, see the Metadata API reference.

metadata.update_column(
    column_name='start_date',
    sdtype='datetime',
    datetime_format='%Y-%m-%d')
    
metadata.update_column(
    column_name='user_cell',
    sdtype='phone_number',
    pii=True)
    
metadata.validate()

You can save the metadata object as a JSON file and load it again for future use.

save_to_json

Use this to save the metadata object to a new JSON file that will be compatible with SDV 1.0 and beyond. We recommend you write the metadata to a new file every time you update it.

Parameters

(required) filepath: The location of the file that will be created with the JSON metadata
mode: A string describing the mode to use when creating the JSON file
- (default) 'write': Write the metadata to the file, raising an error if the file already exists
- 'overwrite': Write the metadata to the file, replacing the contents if the file already exists

Output (None)

metadata.save_to_json(filepath='my_metadata_v1.json')

load_from_json

Use this method to load your file as a Metadata object.

Parameters

(required) filepath: The name of the file containing the JSON metadata

Output: A Metadata object.

metadata = Metadata.load_from_json(filepath='my_metadata_v1.json')

PreviousLoading Data NextModeling

Last updated 3 months ago