Synthetic Data Vault
GitHubSlackDataCebo
  • Welcome to the SDV!
  • Tutorials
  • Explore SDV
    • SDV Community
    • SDV Enterprise
      • ⭐Compare Features
    • SDV Bundles
      • ❖ AI Connectors
      • ❖ CAG
      • ❖ Differential Privacy
      • ❖ XSynthesizers
  • Single Table Data
    • Data Preparation
      • Loading Data
      • Creating Metadata
    • Modeling
      • Synthesizers
        • GaussianCopulaSynthesizer
        • CTGANSynthesizer
        • TVAESynthesizer
        • ❖ XGCSynthesizer
        • ❖ BootstrapSynthesizer
        • ❖ SegmentSynthesizer
        • * DayZSynthesizer
        • ❖ DPGCSynthesizer
        • ❖ DPGCFlexSynthesizer
        • CopulaGANSynthesizer
      • Customizations
        • Constraints
        • Preprocessing
    • Sampling
      • Sample Realistic Data
      • Conditional Sampling
    • Evaluation
      • Diagnostic
      • Data Quality
      • Visualization
      • Privacy
        • Empirical Differential Privacy
        • SDMetrics: Privacy Metrics
  • Multi Table Data
    • Data Preparation
      • Loading Data
        • Demo Data
        • CSV
        • Excel
        • ❖ AlloyDB
        • ❖ BigQuery
        • ❖ MSSQL
        • ❖ Oracle
        • ❖ Spanner
      • Cleaning Your Data
      • Creating Metadata
    • Modeling
      • Synthesizers
        • * DayZSynthesizer
        • * IndependentSynthesizer
        • HMASynthesizer
        • * HSASynthesizer
      • Customizations
        • Constraints
        • Preprocessing
      • * Performance Estimates
    • Sampling
    • Evaluation
      • Diagnostic
      • Data Quality
      • Visualization
  • Sequential Data
    • Data Preparation
      • Loading Data
      • Cleaning Your Data
      • Creating Metadata
    • Modeling
      • PARSynthesizer
      • Customizations
    • Sampling
      • Sample Realistic Data
      • Conditional Sampling
    • Evaluation
  • Concepts
    • Metadata
      • Sdtypes
      • Metadata API
      • Metadata JSON
    • Constraint-Augmented Generation (CAG)
      • Predefined Constraints
        • FixedCombinations
        • FixedIncrements
        • Inequality
        • OneHotEncoding
        • Range
        • ❖ CarryOverColumns
        • * ChainedInequality
        • ❖ CompositeKeys
        • ❖ FixedNullCombinations
        • ❖ ForeignToForeignKey
        • ❖ ForeignToPrimaryKeySubset
        • ❖ MixedScales
        • ❖ PrimaryToPrimaryKey
        • ❖ PrimaryToPrimaryKeySubset
        • ❖ ReferenceTable
        • ❖ SelfReferentialHierarchy
        • ❖ UniqueBridgeTable
      • Program Your Own Constraint
      • Constraints API
  • Support
    • Troubleshooting
      • Help with Installation
      • Help with SDV
    • Versioning & Backwards Compatibility Policy
Powered by GitBook

Copyright (c) 2023, DataCebo, Inc.

On this page
  • Auto Detect Metadata
  • detect_from_dataframes
  • Updating Metadata
  • Saving, Loading & Sharing Metadata
  • save_to_json
  • load_from_json
  1. Multi Table Data
  2. Data Preparation

Creating Metadata

Auto Detect Metadata

If you don't already have a metadata object, we recommend auto-detecting it based on your data.

detect_from_dataframes

Use this function to automatically detect metadata from your data that you've loaded as a pandas.DataFrame objects.

Parameters:

  • (required) data: Your data, represented as a dictionary. The keys are your table names and values are the pandas.DataFrame objects containing your data.

  • infer_sdtypes: A boolean describing whether to infer the sdtypes of each column

    • (default) True: Infer the sdtypes of each column based on the data.

    • False: Do not infer the sdtypes. All columns will be marked as unknown, ready for you to manually update.

  • infer_keys: A string describing whether to infer the primary and/or foreign keys.

    • (default) 'primary_and_foreign': Infer the primary keys in each table, and the foreign keys in other tables that refer to them

    • 'primary_only': Infer the primary keys in each table. You can manually add the foreign key relationships later.

    • None: Do not infer any primary or foreign keys. You can manually add these later.

  • foreign_key_inference_algorithm: The algorithm to use when inferring the foreign key connections to primary keys

    • (default) 'column_name_match': Match up foreign and primary key columns that have the same names

    • *(default, SDV Enterprise) 'data_match': Match up foreign and primary key columns based on the data that they contain

Output A Metadata object that describes the data

from sdv.metadata import Metadata

metadata = Metadata.detect_from_dataframes(
    data={
        'hotels': hotels_dataframe,
        'guests': guests_dataframe
    })

Updating Metadata

The detected metadata is not guaranteed to be accurate or complete. Be sure to carefully inspect the metadata and update information.

For more information about inspecting and updating your metadata, see the Metadata API reference.

metadata.update_column(
    column_name='age',
    sdtype='numerical',
    table_name='users'
)

metadata.validate()

Saving, Loading & Sharing Metadata

You can save the metadata object as a JSON file and load it again for future use.

save_to_json

Use this to save the metadata object to a new JSON file that will be compatible with SDV 1.0 and beyond. We recommend you write the metadata to a new file every time you update it.

Parameters

  • (required) filepath: The location of the file that will be created with the JSON metadata

  • mode: A string describing the mode to use when creating the JSON file

    • (default) 'write': Write the metadata to the file, raising an error if the file already exists

    • 'overwrite': Write the metadata to the file, replacing the contents if the file already exists

Output (None)

metadata.save_to_json(filepath='my_metadata_v1.json')

load_from_json

Use this method to load your file as a Metadata object.

Parameters

  • (required) filepath: The name of the file containing the JSON metadata

Output: A Metadata object.

metadata = Metadata.load_from_json(filepath='my_metadata_v1.json')
PreviousCleaning Your DataNextModeling

Last updated 2 months ago

*SDV Enterprise Feature. This feature is only available for licensed, enterprise users. For more information, visit our page to Compare SDV Features.