# CSV

### CSVHandler <a href="#bigqueryconnector" id="bigqueryconnector"></a>

Use this object to create a handler for reading and writing local CSV files.

```python
from sdv.io.local import CSVHandler

connector = CSVHandler()
```

**Parameters** (None)

**Output** A CSVHandler object you can use to read and write CSV files

### read

Use this function to read multiple CSV files form your local machine

```python
data = connector.read(
    folder_name='project/data/',
    file_names=['users.csv', 'transactions.csv', 'sessions.csv'],
    read_csv_parameters={
        'parse_dates': False,
        'encoding':'latin-1'
    }
)
```

**Parameters**

* (required) `folder_name`: A string name of the folder that contains your CSV files
* `file_names`: A list of strings with the exact file names to read
  * (default) `None`: Read all the CSV files that are in the specified folder
  * `<list>`: Only read the list of CSV files that are in the list
* `keep_leading_zeros`: A boolean that describes whether any values with leading zeros should be kept, or whether they can be safely removed and converted to ints/floats.
  * (default) `True`: Keep any leading zeros. This is especially helpful when the data is not truly numerical, for example `"02446"` is a postal code in Boston. It is not valid to read this as a number `2,446`.
  * `False`: Do not keep any leading zeros. Select this for faster read times if you know your data contains numbers.
* `read_csv_parameters`: A dictionary with additional parameters to use when reading the CSVs. The keys are any of the parameter names of the [pands.read\_csv](https://pandas.pydata.org/docs/reference/api/pandas.read_csv.html) function and the values are your inputs.
  * (default) `{ 'parse_dates': False, 'low_memory': False, 'on_bad_lines': 'warn'}`: Do not infer any datetime formats, assume low memory, or error if it's not possible to read a line. (Use all the other defaults of the `read_csv` function.)

**Output** A dictionary that contains all the CSV data found in the folder. The key is the name of the file (without the `.csv` suffix) and the value is a [pandas DataFrame](https://pandas.pydata.org/docs/reference/api/pandas.DataFrame.html) containing the data.

### write

Use this function to write synthetic data as multiple CSV files

```python
connector.write(
  synthetic_data,
  folder_name='project/synthetic_data',
  to_csv_parameters={
      'encoding': 'latin-1',
      'index': False
  },
  file_name_suffix='_v1', 
  mode='x')
)
```

**Parameters**

* (required) `synthetic_data`: You data, represented as a dictionary. The key is the name of each table and the value is a [pandas DataFrame](https://pandas.pydata.org/docs/reference/api/pandas.DataFrame.html) containing the data.&#x20;
* (required) `folder_name`: A string name of the folder where you would like to write the synthetic data
* `to_csv_parameters`: A dictionary with additional parameters to use when writing the CSVs. The keys are any of the parameter names of the [pandas.to\_csv](https://pandas.pydata.org/docs/reference/api/pandas.DataFrame.to_csv.html) function and the values are your inputs.
  * (default) `{ 'index': False }`: Do not write the index column to the CSV. (Use all the other defaults of the `to_csv` function.)
* `file_name_suffix`: The suffix to add to each filename. Use this if to add specific version numbers or other info.
  * (default) `None`: Do not add a suffix. The file name will be the same as the table name with a `.csv` extension
  * \<string>: Append the suffix after the table name. Eg. a suffix `'_synth1'` will write a file as `table_synth1.csv`&#x20;
* `mode`: A string signaling which mode of writing to use
  * (default) `'x'`: Write to new files, raising errors if any existing files exist with the same name
  * `'w'`: Write to new files, clearing any existing files that exist
  * `'a'`: Append the new CSV rows to any existing files

**Output** (None) The data will be written as CSV files
