Synthetic Data Vault
GitHubSlackDataCebo
  • Welcome to the SDV!
  • Tutorials
  • Explore SDV
    • SDV Community
    • SDV Enterprise
      • ⭐Compare Features
    • SDV Bundles
      • ❖ AI Connectors
      • ❖ CAG
      • ❖ Differential Privacy
      • ❖ XSynthesizers
  • Single Table Data
    • Data Preparation
      • Loading Data
      • Creating Metadata
    • Modeling
      • Synthesizers
        • GaussianCopulaSynthesizer
        • CTGANSynthesizer
        • TVAESynthesizer
        • ❖ XGCSynthesizer
        • ❖ SegmentSynthesizer
        • * DayZSynthesizer
        • ❖ DPGCSynthesizer
        • ❖ DPGCFlexSynthesizer
        • CopulaGANSynthesizer
      • Customizations
        • Constraints
        • Preprocessing
    • Sampling
      • Sample Realistic Data
      • Conditional Sampling
    • Evaluation
      • Diagnostic
      • Data Quality
      • Visualization
  • Multi Table Data
    • Data Preparation
      • Loading Data
        • Demo Data
        • CSV
        • Excel
        • ❖ AlloyDB
        • ❖ BigQuery
        • ❖ MSSQL
        • ❖ Oracle
        • ❖ Spanner
      • Cleaning Your Data
      • Creating Metadata
    • Modeling
      • Synthesizers
        • * DayZSynthesizer
        • * IndependentSynthesizer
        • HMASynthesizer
        • * HSASynthesizer
      • Customizations
        • Constraints
        • Preprocessing
      • * Performance Estimates
    • Sampling
    • Evaluation
      • Diagnostic
      • Data Quality
      • Visualization
  • Sequential Data
    • Data Preparation
      • Loading Data
      • Cleaning Your Data
      • Creating Metadata
    • Modeling
      • PARSynthesizer
      • Customizations
    • Sampling
      • Sample Realistic Data
      • Conditional Sampling
    • Evaluation
  • Concepts
    • Metadata
      • Sdtypes
      • Metadata API
      • Metadata JSON
    • Constraints
      • Predefined Constraints
        • Positive
        • Negative
        • ScalarInequality
        • ScalarRange
        • FixedIncrements
        • FixedCombinations
        • ❖ FixedNullCombinations
        • ❖ MixedScales
        • OneHotEncoding
        • Inequality
        • Range
        • * ChainedInequality
      • Custom Logic
        • Example: IfTrueThenZero
      • ❖ Constraint Augmented Generation (CAG)
        • ❖ CarryOverColumns
        • ❖ CompositeKey
        • ❖ ForeignToForeignKey
        • ❖ ForeignToPrimaryKeySubset
        • ❖ PrimaryToPrimaryKey
        • ❖ PrimaryToPrimaryKeySubset
        • ❖ SelfReferentialHierarchy
        • ❖ ReferenceTable
        • ❖ UniqueBridgeTable
  • Support
    • Troubleshooting
      • Help with Installation
      • Help with SDV
    • Versioning & Backwards Compatibility Policy
Powered by GitBook

Copyright (c) 2023, DataCebo, Inc.

On this page
  • CSVHandler
  • read
  • write
  1. Multi Table Data
  2. Data Preparation
  3. Loading Data

CSV

This functionality is in Beta! Beta functionality may have bugs and may change in the future. Help us out by testing this functionality and letting us know if you encounter any issues.

CSVHandler

Use this object to create a handler for reading and writing local CSV files.

from sdv.io.local import CSVHandler

connector = CSVHandler()

Parameters (None)

Output A CSVHandler object you can use to read and write CSV files

read

Use this function to read multiple CSV files form your local machine

data = connector.read(
    folder_name='project/data/',
    file_names=['users.csv', 'transactions.csv', 'sessions.csv'],
    read_csv_parameters={
        'parse_dates': False,
        'encoding':'latin-1'
    }
)

Parameters

  • (required) folder_name: A string name of the folder that contains your CSV files

  • file_names: A list of strings with the exact file names to read

    • (default) None: Read all the CSV files that are in the specified folder

    • <list>: Only read the list of CSV files that are in the list

    • (default) { 'parse_dates': False, 'low_memory': False, 'on_bad_lines': 'warn'}: Do not infer any datetime formats, assume low memory, or error if it's not possible to read a line. (Use all the other defaults of the read_csv function.)

write

Use this function to write synthetic data as multiple CSV files

connector.write(
  synthetic_data,
  folder_name='project/synthetic_data',
  to_csv_parameters={
      'encoding': 'latin-1',
      'index': False
  },
  file_name_suffix='_v1', 
  mode='x')
)

Parameters

  • (required) folder_name: A string name of the folder where you would like to write the synthetic data

    • (default) { 'index': False }: Do not write the index column to the CSV. (Use all the other defaults of the to_csv function.)

  • file_name_suffix: The suffix to add to each filename. Use this if to add specific version numbers or other info.

    • (default) None: Do not add a suffix. The file name will be the same as the table name with a .csv extension

    • <string>: Append the suffix after the table name. Eg. a suffix '_synth1' will write a file as table_synth1.csv

  • mode: A string signaling which mode of writing to use

    • (default) 'x': Write to new files, raising errors if any existing files exist with the same name

    • 'w': Write to new files, clearing any existing files that exist

    • 'a': Append the new CSV rows to any existing files

Output (None) The data will be written as CSV files

PreviousDemo DataNextExcel

Last updated 3 days ago

read_csv_parameters: A dictionary with additional parameters to use when reading the CSVs. The keys are any of the parameter names of the function and the values are your inputs.

Output A dictionary that contains all the CSV data found in the folder. The key is the name of the file (without the .csv suffix) and the value is a containing the data.

(required) synthetic_data: You data, represented as a dictionary. The key is the name of each table and the value is a containing the data.

to_csv_parameters: A dictionary with additional parameters to use when writing the CSVs. The keys are any of the parameter names of the function and the values are your inputs.

pands.read_csv
pandas DataFrame
pandas DataFrame
pandas.to_csv