CSV
This functionality is in Beta! Beta functionality may have bugs and may change in the future. Help us out by testing this functionality and letting us know if you encounter any issues.
CSVHandler
Use this object to create a handler for reading and writing local CSV files.
from sdv.io.local import CSVHandler
connector = CSVHandler()
Parameters (None)
Output A CSVHandler object you can use to read and write CSV files
read
Use this function to read multiple CSV files form your local machine
data = connector.read(
folder_name='project/data/',
file_names=['users.csv', 'transactions.csv', 'sessions.csv'],
read_csv_parameters={
'parse_dates': False,
'encoding':'latin-1'
}
)
Parameters
(required)
folder_name
: A string name of the folder that contains your CSV filesfile_names
: A list of strings with the exact file names to read(default)
None
: Read all the CSV files that are in the specified folder<list>
: Only read the list of CSV files that are in the list
read_csv_parameters
: A dictionary with additional parameters to use when reading the CSVs. The keys are any of the parameter names of the pands.read_csv function and the values are your inputs.(default)
{ 'parse_dates': False, 'low_memory': False, 'on_bad_lines': 'warn'}
: Do not infer any datetime formats, assume low memory, or error if it's not possible to read a line. (Use all the other defaults of theread_csv
function.)
Output A dictionary that contains all the CSV data found in the folder. The key is the name of the file (without the .csv
suffix) and the value is a pandas DataFrame containing the data.
write
Use this function to write synthetic data as multiple CSV files
connector.write(
synthetic_data,
folder_name='project/synthetic_data',
to_csv_parameters={
'encoding': 'latin-1',
'index': False
},
file_name_suffix='_v1',
mode='x')
)
Parameters
(required)
synthetic_data
: You data, represented as a dictionary. The key is the name of each table and the value is a pandas DataFrame containing the data.(required)
folder_name
: A string name of the folder where you would like to write the synthetic datato_csv_parameters
: A dictionary with additional parameters to use when writing the CSVs. The keys are any of the parameter names of the pandas.to_csv function and the values are your inputs.(default)
{ 'index': False }
: Do not write the index column to the CSV. (Use all the other defaults of theto_csv
function.)
file_name_suffix
: The suffix to add to each filename. Use this if to add specific version numbers or other info.(default)
None
: Do not add a suffix. The file name will be the same as the table name with a.csv
extension<string>: Append the suffix after the table name. Eg. a suffix
'_synth1'
will write a file astable_synth1.csv
mode
: A string signaling which mode of writing to use(default)
'x'
: Write to new files, raising errors if any existing files exist with the same name'w'
: Write to new files, clearing any existing files that exist'a'
: Append the new CSV rows to any existing files
Output (None) The data will be written as CSV files
Last updated