> For the complete documentation index, see [llms.txt](https://docs.sdv.dev/sdgym/llms.txt). Markdown versions of documentation pages are available by appending `.md` to page URLs; this page is available as [Markdown](https://docs.sdv.dev/sdgym/benchmarking/run/aws.md).

# AWS Runs

This page guides you through running the SDGym benchmark on the cloud using AWS. AWS will be used for accessing any custom datasets you may have on S3, running the synthesizers on EC2, and writing the final results into S3.

&#x20;To run on the locally, please see the guide for [Local Runs](/sdgym/benchmarking/run/local.md).

```python
import sdgym

results = sdgym.benchmark_single_table_aws(
    aws_access_key_id='my_access_key',
    aws_secret_access_key='my_secret',
    output_destination='s3://sdgym_results_bucket/'
)
```

See [Interpreting Results](/sdgym/benchmarking/interpreting-results.md) for a description of the benchmarking results.

## Authentication Parameters

These parameters are required unless you have followed Amazon's instructions to set up [environment variables](https://boto3.amazonaws.com/v1/documentation/api/latest/guide/credentials.html#environment-variables). We recommend supplying these parameters to ensure the benchmark can access S3 and EC2.

**`aws_access_key_id`**: A string containing your AWS access key id

**`aws_secret_access_key`**: A string containing your AWS secret access key

## Optional Parameters

Every step of the benchmarking process is customizable. Use the optional parameters to control the setup, execution and evaluation.

### Setup

Use these parameters to control which synthesizers and datasets to include in the benchmark.

**`synthesizers`**: Control which synthesizers to use by supplying a list of strings with the synthesizer names

* (default) `['GaussianCopulaSynthesizer', 'CTGANSynthesizer', 'UniformSynthesizer']`
* Options include `'GaussianCopulaSynthesizer'`, '`CTGANSynthesizer'`, `'TVAESynthesizer'` and `'CopulaGANSynthesizer'` , and many more. *You may supply* [*SDV Synthesizers*](/sdgym/customization/synthesizers/sdv-synthesizers.md)*,* [*Basic Synthesizers*](/sdgym/customization/synthesizers/basic-synthesizers.md)*, or* [*3rd Party Synthesizers*](/sdgym/customization/synthesizers/3rd-party-synthesizers.md)*. Currently, custom synthesizers are not supported for AWS runs; please run your benchmark locally in this case.*

```python
sdgym.benchmark_single_table_aws(
    synthesizers=['TVAESynthesizer', 'ColumnSynthesizer', 'RealTabFormerSynthesizer'])
```

{% hint style="success" %}
**Simulating graceful degradation.** SDGym always runs the UniformSynthesizer as a backup synthesizer, even if it is explicitly specified. This backup synthesizer is used to simulate graceful degradation in an enterprise setting. For more information, see [Graceful Handling of Errors](/sdgym/benchmarking/interpreting-results/results-summary.md#graceful-handling-of-errors).
{% endhint %}

**`sdv_datasets`**: Control which of the SDV demo datasets to use by supplying their names as a list of strings.

* (default) `['adult', 'alarm', 'census', 'child', 'expedia_hotel_logs', 'insurance', 'intrusion', 'news', 'covtype']`
* See [Datasets](/sdgym/customization/datasets.md) for more options

**`additional_datasets_folder`**: Supply the name of an S3 bucket that contains additional datasets.&#x20;

* (default) `None`: Do not run the benchmark for any additional datasets.
* `<string>`: The path to your S3 bucket that contains additional datasets. This should start with the prefix `s3://`. Make sure your datasets are in the correct format as described in the [Dataset Format](/sdgym/customization/datasets/dataset-format.md) guide. Also make sure that you have provided the permissions to read from this folder.

### Execution

Use these parameters to control speed and flow of the benchmarking.

**`limit_dataset_size`**: Set this boolean to limit the size of every dataset. This will yield faster results but may affect the overall quality.

* (default) `False`: Use the full datasets for benchmarking.
* `True`: Limit the dataset size before benchmarking. For every dataset selected, use only 1000 rows (randomly sampled) and the first 10 columns.

**`timeout`**: The maximum number of seconds to give to each synthesizer to train and sample a dataset

* (default) `None`: Do not set a maximum. Allow the synthesizer to take as long as it needs.
* `<integer>`: Allow a synthesizer to run on the integer number of seconds for each dataset. If the synthesizer is exceeding the time, the benchmark will report a `SynthesizerTimeoutError`.

**`output_destination`**: Supply the name of an S3 bucket where you'd like to save the final results, as well as all the detailed artifacts created in the process.

* (default) `None`: Do not save any of the results.
* `<string>`: The path to your S3 bucket where you'd like to store the final results and detailed artifacts. This should start with the prefix `s3://`. For more details on what will be saved, see the [Results Summary](/sdgym/benchmarking/interpreting-results/results-summary.md) and [Artifacts](/sdgym/benchmarking/interpreting-results/explore-the-artifacts.md) guides.

{% hint style="info" %}
**Are you running a benchmark regularly?** We recommend writing to the same `output_destination` folder every time. Each benchmark run will be stored in a different folder, ready for you to explore and compare results. For more information, see the [Artifacts](/sdgym/benchmarking/interpreting-results/explore-the-artifacts.md) guide.
{% endhint %}

### Evaluation

Use the evaluation parameters to control what to measure when benchmarking.

{% hint style="success" %}
The SDGym benchmark will always measure performance (time and memory). Use additional parameters to evaluate other aspects of the synthetic data after it's created.
{% endhint %}

**`compute_diagnostic_score`**: Set this boolean to generate an overall diagnostic score for every synthesizer and dataset. This may increase the benchmarking time.

* (default) `True`: Compute an overall diagnostic score. See the [SDMetrics Diagnostic Report](https://docs.sdv.dev/sdmetrics/reports/diagnostic-report/single-table-api) for more details.
* `False`: Do not compute a diagnostic score.

**`compute_quality_score`**: Set this boolean to generate an overall quality score for every synthesizer and dataset. This may increase the benchmarking time.

* (default) `True`: Compute an overall quality score. See the [SDMetrics Quality Report](https://docs.sdv.dev/sdmetrics/reports/quality-report/single-table-quality-report) for more details.
* `False`: Do not compute a quality score.

**`compute_privacy_score`**: Set this boolean to generate an overall privacy score for every synthesizer and dataset. This may increase the benchmarking time.

* (default) `True`: Compute the privacy score. See the [DCRBaselineProtection metric](https://docs.sdv.dev/sdmetrics/metrics/privacy-metrics/dcrbaselineprotection) for more details.
* `False`: Do not compute a privacy score.

**`sdmetrics`**: Provide a list of strings corresponding to additional metrics from the [SDMetrics library](https://docs.sdv.dev/sdmetrics/metrics/metrics-glossary).

{% hint style="info" %}
To pass in optional parameters, specify a tuple with the metric name followed by a dictionary of parameters and values to pass into the metric.
{% endhint %}

* (default) `None`: Do not apply any additional metrics.
* See the [SDMetrics library](https://docs.sdv.dev/sdmetrics/) for more metric options


---

# Agent Instructions
This documentation is published with GitBook. GitBook is the documentation platform designed so that both humans and AI agents can read, navigate, and reason over technical content effectively. Learn more at gitbook.com.

## Querying This Documentation
If you need additional information that is not directly available in this page, you can query the documentation dynamically by asking a question.

Perform an HTTP GET request on the current page URL with the `ask` query parameter, and the optional `goal` query parameter:

```
GET https://docs.sdv.dev/sdgym/benchmarking/run/aws.md?ask=<question>&goal=<endgoal>
```

`ask` is the immediate question: it should be specific, self-contained, and written in natural language.
`goal` is optional and describes the broader end goal you are ultimately trying to accomplish on behalf of the user. GitBook uses it to tailor the answer towards what is most useful for that goal.

The response will contain a direct answer to the question and relevant excerpts and sources from the documentation.

Use this mechanism when the answer is not explicitly present in the current page, you need clarification or additional context, or you want to retrieve related documentation sections.
