Creating Metadata

Auto Detect Metadata

If you don't already have a metadata object, we recommend auto-detecting it based on your data.

detect_from_dataframes

Use this function to automatically detect metadata from your data that you've loaded as a pandas.DataFrame objects.

Parameters:

(required) data: Your data, represented as a dictionary. The keys are your table names and values are the pandas.DataFrame objects containing your data.
infer_sdtypes: A boolean describing whether to infer the sdtypes of each column
- (default) True: Infer the sdtypes of each column based on the data.
- False: Do not infer the sdtypes. All columns will be marked as unknown, ready for you to manually update.
infer_keys: A string describing whether to infer the primary and/or foreign keys.
- (default) 'primary_and_foreign': Infer the primary keys in each table, and the foreign keys in other tables that refer to them
- 'primary_only': Infer the primary keys in each table. You can manually add the foreign key relationships later.
- None: Do not infer any primary or foreign keys. You can manually add these later.
foreign_key_inference_algorithm: The algorithm to use when inferring the foreign key connections to primary keys
- (default) 'column_name_match': Match up foreign and primary key columns that have the same names
- ＊(default, SDV Enterprise) 'data_match': Match up foreign and primary key columns based on the data that they contain

Output A Metadata object that describes the data

from sdv.metadata import Metadata

metadata = Metadata.detect_from_dataframes(
    data={
        'hotels': hotels_dataframe,
        'guests': guests_dataframe
    })

＊SDV Enterprise Feature. This feature is only available for licensed, enterprise users. For more information, visit our page to Compare SDV Features.

Updating Metadata

The detected metadata is not guaranteed to be accurate or complete. Be sure to carefully inspect the metadata and update information.

For more information about inspecting and updating your metadata, see the Metadata API reference.

metadata.update_column(
    column_name='age',
    sdtype='numerical',
    table_name='users'
)

metadata.validate()

You can save the metadata object as a JSON file and load it again for future use.

save_to_json

Use this to save the metadata object to a new JSON file that will be compatible with SDV 1.0 and beyond. We recommend you write the metadata to a new file every time you update it.

Parameters

(required) filepath: The location of the file that will be created with the JSON metadata
mode: A string describing the mode to use when creating the JSON file
- (default) 'write': Write the metadata to the file, raising an error if the file already exists
- 'overwrite': Write the metadata to the file, replacing the contents if the file already exists

Output (None)

metadata.save_to_json(filepath='my_metadata_v1.json')

load_from_json

Use this method to load your file as a Metadata object.

Parameters

(required) filepath: The name of the file containing the JSON metadata

Output: A Metadata object.

metadata = Metadata.load_from_json(filepath='my_metadata_v1.json')

PreviousCleaning Your Data NextModeling

Last updated 4 months ago