CSV
CSVHandler
Use this object to create a handler for reading and writing local CSV files.
from sdv.io.local import CSVHandler
connector = CSVHandler()Parameters (None)
Output A CSVHandler object you can use to read and write CSV files
read
Use this function to read multiple CSV files form your local machine
data = connector.read(
folder_name='project/data/',
file_names=['users.csv', 'transactions.csv', 'sessions.csv'],
read_csv_parameters={
'parse_dates': False,
'encoding':'latin-1'
}
)Parameters
(required)
folder_name: A string name of the folder that contains your CSV filesfile_names: A list of strings with the exact file names to read(default)
None: Read all the CSV files that are in the specified folder<list>: Only read the list of CSV files that are in the list
keep_leading_zeros: A boolean that describes whether any values with leading zeros should be kept, or whether they can be safely removed and converted to ints/floats.(default)
True: Keep any leading zeros. This is especially helpful when the data is not truly numerical, for example"02446"is a postal code in Boston. It is not valid to read this as a number2,446.False: Do not keep any leading zeros. Select this for faster read times if you know your data contains numbers.
read_csv_parameters: A dictionary with additional parameters to use when reading the CSVs. The keys are any of the parameter names of the pands.read_csv function and the values are your inputs.(default)
{ 'parse_dates': False, 'low_memory': False, 'on_bad_lines': 'warn'}: Do not infer any datetime formats, assume low memory, or error if it's not possible to read a line. (Use all the other defaults of theread_csvfunction.)
Output A dictionary that contains all the CSV data found in the folder. The key is the name of the file (without the .csv suffix) and the value is a pandas DataFrame containing the data.
write
Use this function to write synthetic data as multiple CSV files
Parameters
(required)
synthetic_data: You data, represented as a dictionary. The key is the name of each table and the value is a pandas DataFrame containing the data.(required)
folder_name: A string name of the folder where you would like to write the synthetic datato_csv_parameters: A dictionary with additional parameters to use when writing the CSVs. The keys are any of the parameter names of the pandas.to_csv function and the values are your inputs.(default)
{ 'index': False }: Do not write the index column to the CSV. (Use all the other defaults of theto_csvfunction.)
file_name_suffix: The suffix to add to each filename. Use this if to add specific version numbers or other info.(default)
None: Do not add a suffix. The file name will be the same as the table name with a.csvextension<string>: Append the suffix after the table name. Eg. a suffix
'_synth1'will write a file astable_synth1.csv
mode: A string signaling which mode of writing to use(default)
'x': Write to new files, raising errors if any existing files exist with the same name'w': Write to new files, clearing any existing files that exist'a': Append the new CSV rows to any existing files
Output (None) The data will be written as CSV files
Last updated