Synthetic Data Vault
GitHubSlackDataCebo
  • Welcome to the SDV!
  • Tutorials
  • Explore SDV
    • SDV Community
    • SDV Enterprise
      • ⭐Compare Features
    • SDV Bundles
      • ❖ AI Connectors
      • ❖ CAG
      • ❖ Differential Privacy
      • ❖ XSynthesizers
  • Single Table Data
    • Data Preparation
      • Loading Data
      • Creating Metadata
    • Modeling
      • Synthesizers
        • GaussianCopulaSynthesizer
        • CTGANSynthesizer
        • TVAESynthesizer
        • ❖ XGCSynthesizer
        • ❖ SegmentSynthesizer
        • * DayZSynthesizer
        • ❖ DPGCSynthesizer
        • ❖ DPGCFlexSynthesizer
        • CopulaGANSynthesizer
      • Customizations
        • Constraints
        • Preprocessing
    • Sampling
      • Sample Realistic Data
      • Conditional Sampling
    • Evaluation
      • Diagnostic
      • Data Quality
      • Visualization
  • Multi Table Data
    • Data Preparation
      • Loading Data
        • Demo Data
        • CSV
        • Excel
        • ❖ AlloyDB
        • ❖ BigQuery
        • ❖ MSSQL
        • ❖ Oracle
        • ❖ Spanner
      • Cleaning Your Data
      • Creating Metadata
    • Modeling
      • Synthesizers
        • * DayZSynthesizer
        • * IndependentSynthesizer
        • HMASynthesizer
        • * HSASynthesizer
      • Customizations
        • Constraints
        • Preprocessing
      • * Performance Estimates
    • Sampling
    • Evaluation
      • Diagnostic
      • Data Quality
      • Visualization
  • Sequential Data
    • Data Preparation
      • Loading Data
      • Cleaning Your Data
      • Creating Metadata
    • Modeling
      • PARSynthesizer
      • Customizations
    • Sampling
      • Sample Realistic Data
      • Conditional Sampling
    • Evaluation
  • Concepts
    • Metadata
      • Sdtypes
      • Metadata API
      • Metadata JSON
    • Constraints
      • Predefined Constraints
        • Positive
        • Negative
        • ScalarInequality
        • ScalarRange
        • FixedIncrements
        • FixedCombinations
        • ❖ FixedNullCombinations
        • ❖ MixedScales
        • OneHotEncoding
        • Inequality
        • Range
        • * ChainedInequality
      • Custom Logic
        • Example: IfTrueThenZero
      • ❖ Constraint Augmented Generation (CAG)
        • ❖ CarryOverColumns
        • ❖ CompositeKey
        • ❖ ForeignToForeignKey
        • ❖ ForeignToPrimaryKeySubset
        • ❖ PrimaryToPrimaryKey
        • ❖ PrimaryToPrimaryKeySubset
        • ❖ SelfReferentialHierarchy
        • ❖ ReferenceTable
        • ❖ UniqueBridgeTable
  • Support
    • Troubleshooting
      • Help with Installation
      • Help with SDV
    • Versioning & Backwards Compatibility Policy
Powered by GitBook

Copyright (c) 2023, DataCebo, Inc.

On this page
  • Auto Detect Metadata
  • detect_from_dataframes
  • Updating Metadata
  • Saving, Loading & Sharing Metadata
  • save_to_json
  • load_from_json
  1. Multi Table Data
  2. Data Preparation

Creating Metadata

Auto Detect Metadata

If you don't already have a metadata object, we recommend auto-detecting it based on your data.

detect_from_dataframes

Use this function to automatically detect metadata from your data that you've loaded as a pandas.DataFrame objects.

Parameters:

  • (required) data: Your data, represented as a dictionary. The keys are your table names and values are the pandas.DataFrame objects containing your data.

  • infer_sdtypes: A boolean describing whether to infer the sdtypes of each column

    • (default) True: Infer the sdtypes of each column based on the data.

    • False: Do not infer the sdtypes. All columns will be marked as unknown, ready for you to manually update.

  • infer_keys: A string describing whether to infer the primary and/or foreign keys.

    • (default) 'primary_and_foreign': Infer the primary keys in each table, and the foreign keys in other tables that refer to them

    • 'primary_only': Infer the primary keys in each table. You can manually add the foreign key relationships later.

    • None: Do not infer any primary or foreign keys. You can manually add these later.

  • foreign_key_inference_algorithm: The algorithm to use when inferring the foreign key connections to primary keys

    • (default) 'column_name_match': Match up foreign and primary key columns that have the same names

    • *(default, SDV Enterprise) 'data_match': Match up foreign and primary key columns based on the data that they contain

Output A Metadata object that describes the data

from sdv.metadata import Metadata

metadata = Metadata.detect_from_dataframes(
    data={
        'hotels': hotels_dataframe,
        'guests': guests_dataframe
    })

Updating Metadata

The detected metadata is not guaranteed to be accurate or complete. Be sure to carefully inspect the metadata and update information.

metadata.update_column(
    column_name='age',
    sdtype='numerical',
    table_name='users'
)

metadata.validate()

Saving, Loading & Sharing Metadata

You can save the metadata object as a JSON file and load it again for future use.

save_to_json

Use this to save the metadata object to a new JSON file that will be compatible with SDV 1.0 and beyond. We recommend you write the metadata to a new file every time you update it.

Parameters

  • (required) filepath: The location of the file that will be created with the JSON metadata

  • mode: A string describing the mode to use when creating the JSON file

    • (default) 'write': Write the metadata to the file, raising an error if the file already exists

    • 'overwrite': Write the metadata to the file, replacing the contents if the file already exists

Output (None)

metadata.save_to_json(filepath='my_metadata_v1.json')

load_from_json

Use this method to load your file as a Metadata object.

Parameters

  • (required) filepath: The name of the file containing the JSON metadata

Output: A Metadata object.

metadata = Metadata.load_from_json(filepath='my_metadata_v1.json')
PreviousCleaning Your DataNextModeling

Last updated 1 month ago

For more information about inspecting and updating your metadata, see the .

Metadata API reference

*SDV Enterprise Feature. This feature is only available for licensed, enterprise users. For more information, visit our page to Explore SDV.