# Cleaning Your Data

Use the utility functions below to clean your sequential data for fast and effective modeling.

### get\_random\_sequence\_subset

Use this function to subsample data from your dataset. Given multi-sequence data, this function will randomly select sequences and clip them to the desired length.

```python
from sdv.utils import get_random_subset

subsampled_data = get_random_subset(
    data, 
    metadata,
    num_sequences=100
)
```

**Parameters**

* (required) `data`: A pandas.DataFrame containing your multi-sequence data
* (required) `metadata`: A [Metadata](https://docs.sdv.dev/sdv/concepts/metadata) object that describes the data. The metadata must describe multi-sequence data, meaning that it must have a sequence key specified.
* (required) `num_sequences`: An int describing the number of sequences to subsample from the data
* `max_sequence_length`: The maximum length each sequence is allowed to be
  * (default) `None`: Do not enforce any max length, meaning that entire sequences will appear in the subsampled data
  * `<integer>`: An integer describing the max sequence length. Any sequence that is longer than this value will be shortened based on the method below
* `long_sequence_subsampling_method`: The method for shortening sequences that are too long
  * (default) `'first_rows'`: Keep the first *n* rows of each sequence as they appear, where *n* is the max sequence length
  * `'last_rows'`: Keep the last *n* rows of each sequence as they appear, where *n* is the max sequence length
  * `'random'`: Randomly choose *n* rows of each sequence, where *n* is the max sequence length. Note the randomly chosen rows will still appear in the same order as the original data.

**Output** A dataset with fewer rows than before. The dataset will continue to represent multiple sequences of potentially varying lengths.
