CSV

CSVHandler

Use this object to create a handler for reading and writing local CSV files.

from sdv.io.local import CSVHandler

connector = CSVHandler()

Parameters (None)

Output A CSVHandler object you can use to read and write CSV files

read

Use this function to read multiple CSV files form your local machine

data = connector.read(
    folder_name='project/data/',
    file_names=['users.csv', 'transactions.csv', 'sessions.csv'],
    read_csv_parameters={
        'parse_dates': False,
        'encoding':'latin-1'
    }
)

Parameters

  • (required) folder_name: A string name of the folder that contains your CSV files

  • file_names: A list of strings with the exact file names to read

    • (default) None: Read all the CSV files that are in the specified folder

    • <list>: Only read the list of CSV files that are in the list

  • read_csv_parameters: A dictionary with additional parameters to use when reading the CSVs. The keys are any of the parameter names of the pands.read_csv function and the values are your inputs.

    • (default) { 'parse_dates': False, 'low_memory': False, 'on_bad_lines': 'warn'}: Do not infer any datetime formats, assume low memory, or error if it's not possible to read a line. (Use all the other defaults of the read_csv function.)

Output A dictionary that contains all the CSV data found in the folder. The key is the name of the file (without the .csv suffix) and the value is a pandas DataFrame containing the data.

write

Use this function to write synthetic data as multiple CSV files

connector.write(
  synthetic_data,
  folder_name='project/synthetic_data',
  to_csv_parameters={
      'encoding': 'latin-1',
      'index': False
  },
  file_name_suffix='_v1', 
  mode='x')
)

Parameters

  • (required) synthetic_data: You data, represented as a dictionary. The key is the name of each table and the value is a pandas DataFrame containing the data.

  • (required) folder_name: A string name of the folder where you would like to write the synthetic data

  • to_csv_parameters: A dictionary with additional parameters to use when writing the CSVs. The keys are any of the parameter names of the pandas.to_csv function and the values are your inputs.

    • (default) { 'index': False }: Do not write the index column to the CSV. (Use all the other defaults of the to_csv function.)

  • file_name_suffix: The suffix to add to each filename. Use this if to add specific version numbers or other info.

    • (default) None: Do not add a suffix. The file name will be the same as the table name with a .csv extension

    • <string>: Append the suffix after the table name. Eg. a suffix '_synth1' will write a file as table_synth1.csv

  • mode: A string signaling which mode of writing to use

    • (default) 'x': Write to new files, raising errors if any existing files exist with the same name

    • 'w': Write to new files, clearing any existing files that exist

    • 'a': Append the new CSV rows to any existing files

Output (None) The data will be written as CSV files

Last updated