# Loading Data

## Demo Data

The SDV library contains many different demo datasets that you can use to get started. Use the `demo` module to access these datasets.

{% hint style="warning" %}
The `demo` module accesses the SDV's public dataset repository. These methods require an internet connection.
{% endhint %}

### get\_available\_demos

Use this method to get information about all the available demos in the SDV's public dataset repository.

**Parameters**

* (required) `modality`: Set this to the string `'sequential'` to see all the sequential demo datasets
* *\[private]* `s3_bucket_name`: Additional, private demo datasets may be available in other buckets. If you have been given access to one of these buckets, you can input the name as a string. *This is currently only available to internal, DataCebo users.*
* *\[private]* `credentials`: Provide a dictionary with the SDV credentials in order to access the bucket. The expected keys are `'username'` and `'license_key'`. *This is currently only available to internal, DataCebo users.*

**Returns** A pandas DataFrame object containing the name of the dataset, its size (in MB) and the number of tables it contains.

{% hint style="warning" %}
The SDV currently only sequential data that is present in a single table. If you use the `'sequential'` modality, the number of tables is always 1.
{% endhint %}

```python
from sdv.datasets.demo import get_available_demos

get_available_demos(modality='sequential')
```

```
dataset_name                    size_MB        num_tables
ArticularyWordRecognition       8.8            1
AtrialFibrillation              0.627          1
BasicMotions                    0.741          1
...                             ...            ...
```

### download\_demo

Use this method to download a demo dataset from the SDV's public dataset repository.

**Parameters**

* (required) `modality`: Set this to the string `'sequential'` to  access sequential demo data
* (required) `dataset_name`: A string with the name of the demo dataset. You can use any of the dataset names from the `get_available_demo` method.
* `output_folder_name`: A string with the name of a folder. If provided, this method will download the data and metadata into the folder, in addition to returning the data.
  * (default) `None`: Do not save the data into a folder. The data will still be returned so that you can use it in your Python script.
* *\[private]* `s3_bucket_name`: Additional, private demo datasets may be available in other buckets. If you have been given access to one of these buckets, you can input the name as a string. *This is currently only available to internal, DataCebo users.*
* *\[private]* `credentials`: Provide a dictionary with the SDV credentials in order to access the bucket. The expected keys are `'username'` and `'license_key'`. *This is currently only available to internal, DataCebo users.*

**Output** A tuple `(data, metadata)`.

The `data` is a [pandas DataFrame](https://pandas.pydata.org/docs/reference/api/pandas.DataFrame.html) containing the demo data and the `metadata` is a [Metadata](https://docs.sdv.dev/sdv/sequential-data/data-preparation/creating-metadata) object the describes the data.

```python
from sdv.datasets.demo import download_demo

data, metadata = download_demo(
    modality='sequential',
    dataset_name='ArticularyWordRecognition',
    output_folder_name='sdv_demo_datasets/word_data/'
)
```

### get\_source

Some datasets have a source file that describes where the dataset comes from. This can include information like a URL, citations for the original publication, and other information that tracks the dataset's provenance. Use this function to get all this source information.

**Parameters**

* (required) `modality`: Set this to the string `'sequential'` to  access sequential demo data
* (required) `dataset_name`: A string with the name of the demo dataset. You can use any of the dataset names from the `get_available_demo` method.
* `output_filepath`: A string with the name of a file path. If provided, this method will create the file and write the source information to the file, in addition to returning it.
  * (default) `None`: Do not save the source information into a file. The contents will still be returned so that you can print it out and read it.
* *\[private]* `s3_bucket_name`: Additional, private demo datasets may be available in other buckets. If you have been given access to one of these buckets, you can input the name as a string. *This is currently only available to internal, DataCebo users.*
* *\[private]* `credentials`: Provide a dictionary with the SDV credentials in order to access the bucket. The expected keys are `'username'` and `'license_key'`. *This is currently only available to internal, DataCebo users.*

**Output** A string containing the contents of the source information. You can print it out to read it. *(If no source information is available for a dataset, the function returns `None` and no file will be written.)*

```python
from sdv.datasets.demo import get_source

source_text = get_source(
    modality='sequential',
    dataset_name='AtrialFibrillation',
    output_filepath='AF_source.txt'
)
print(source_text)
```

```
Source URL: https://www.physionet.org/content/aftdb/1.0.0/
License name: Open Data Commons Attribution License v1.0

Citations:
[1] Moody GB. Spontaneous Termination of Atrial Fibrillation: A Challenge from PhysioNet and Computers in Cardiology 2004. Computers in Cardiology 31:101-104 (2004).
```

### get\_readme

Some datasets have a README file that describes more information about what the dataset means. This could include explanations for naming conventions used in the dataset, mappings for ID codes, or business logic. Use this function to get the README (if it exists).

{% hint style="info" %}
**README information is coming soon!** At this time, SDV demo datasets do not contain any README information. If you're looking for more information about the dataset, we recommend [getting the source](#get_source). From there, you'll be able to navigate to any URLs or contact the original authors as needed. &#x20;
{% endhint %}

**Parameters**

* (required) `modality`: Set this to the string `'sequential'` to  access sequential demo data
* (required) `dataset_name`: A string with the name of the demo dataset. You can use any of the dataset names from the `get_available_demo` method.
* `output_filepath`: A string with the name of a file path. If provided, this method will create the file and write the README information to the file, in addition to returning it.
  * (default) `None`: Do not save the README information into a file. The contents will still be returned so that you can print it out and read it.
* *\[private]* `s3_bucket_name`: Additional, private demo datasets may be available in other buckets. If you have been given access to one of these buckets, you can input the name as a string. *This is currently only available to internal, DataCebo users.*
* *\[private]* `credentials`: Provide a dictionary with the SDV credentials in order to access the bucket. The expected keys are `'username'` and `'license_key'`. *This is currently only available to internal, DataCebo users.*

**Output** A string containing the contents of the README information. You can print it out to read it. *(If no README information is available for a dataset, the function returns `None` and no file will be written.)*

## Loading your own local datasets

A *local* dataset is a dataset that you have already downloaded onto your computer. These do not require any internet connectivity to access.

### CSV Files

Use the CSVHandler object for reading and writing local CSV files.

```python
from sdv.io.local import CSVHandler

connector = CSVHandler()
data = connector.read(
    folder_name='project/data/',
    read_csv_parameters={
        'parse_dates': False,
        'encoding':'latin-1'
    }
)
```

The resulting data is a dictionary that is keyed on the name of your file. For example if your filename is `patients.csv` the data will be available under the `'patients'` key:

```python
users_data = data['patients']
```

For more information about the parameters for reading and writing data, see the [CSVHandler guide](https://docs.sdv.dev/sdv/multi-table-data/data-preparation/loading-data/csv).

{% hint style="info" %}
**Where's the metadata?** If you're loading your own datasets, please create and load in your metadata separately. See the [Metadata guide](https://docs.sdv.dev/sdv/concepts/metadata) for more details.
{% endhint %}

### Other types of data

SDV offers native integration with Excel files, as well as integrations with a variety of different databases. For more information, see our [Loading Data guide](https://docs.sdv.dev/sdv/multi-table-data/data-preparation/loading-data).
