# ＊ DayZSynthesizer

{% hint style="info" %}
**＊SDV Enterprise Feature.** This feature is only available for licensed, enterprise users. For more information, visit our page to [Compare SDV Features](https://docs.sdv.dev/sdv/explore/sdv-enterprise/compare-features).
{% endhint %}

The Day Z Synthesizer produces synthetic data from scratch using the metadata. This allows you start generating synthetic data from **day zero**: no machine learning required!

```python
from sdv.multi_table import DayZSynthesizer

synthesizer = DayZSynthesizer(metadata)
synthetic_data = synthesizer.sample(num_rows=1000)
```

## Estimate parameters

For more realistic data, we recommend estimating some basic DayZ parameters using the real data. This includes information such as the min/max range of numerical columns and the possible category values in categorical columns.

{% hint style="success" %}
**SDV Community users can complete this step.** You may be asked share the DayZ parameters file to the SDV team for help in performance testing or debugging.
{% endhint %}

### Create Parameters

Use the **`create_parameters`** function to estimate the parameters and save them as a JSON file.

```python
from sdv.multi_table import DayZSynthesizer

my_parameters = DayZSynthesizer.create_parameters(
  data=my_data,
  metadata=my_metadata,
  output_filename='dayz_parameters.json'
)
```

**Parameters:**

* (required) `data`: A dictionary mapping each table name to a [pandas DataFrame](https://pandas.pydata.org/docs/reference/api/pandas.DataFrame.html) containing the real data that the machine learning model will learn from
* (required) `metadata`: A [SDV Metadata](https://docs.sdv.dev/sdv/concepts/metadata) object that describes the data
* `output_filepath`: A string with the name of the file in which to save the parameters. This should end in a `.json` suffix.

**Returns**: A Python dictionary representation of the parameters (that are also saved in the JSON).

#### Validate Parameters <a href="#validate-parameters" id="validate-parameters"></a>

Use the **`validate_parameters`** to validate that the parameters accurately reflect the metadata. This is important if you've modified any of the parameters in the file.

```python
DayZSynthesizer.validate_parameters(
    metadata=my_metadata,
    parameters=my_parameters
)
```

**Parameters**:

* (required) `metadata`: An SDV Metadata object that describes the data
* (required) `parameters`: The parameters dictionary

**Returns**: (None) If there are any issues with the parameters, you'll see an error.

## Creating a synthesizer

When creating your synthesizer, you are required to pass in a [Metadata](https://docs.sdv.dev/sdv/~/changes/328/single-table-data/data-preparation/creating-metadata) object as the first argument. We also recommend setting the parameters at this time.

```python
synthesizer = DayZSynthesizer(
    metadata,
    parameters=my_parameters,
    locales=['en_US', 'en_CA', 'fr_CA']
)
```

### Parameter Reference

**`locales`**: A list of locale strings. Any PII columns will correspond to the locales that you provide.

<table data-header-hidden><thead><tr><th width="218"></th><th></th></tr></thead><tbody><tr><td>(default) <code>['en_US']</code></td><td>Generate PII values in English corresponding to US-based concepts (eg. addresses, phone numbers, etc.)</td></tr><tr><td><code>&#x3C;list></code></td><td><p>Create data from the list of locales. Each locale string consists of a 2-character code for the language and 2-character code for the country, separated by an underscore.</p><p></p><p>For example <code>[</code><a href="https://faker.readthedocs.io/en/master/locales/en_US.html"><code>"en_US"</code></a><code>,</code> <a href="https://faker.readthedocs.io/en/master/locales/fr_CA.html"><code>"fr_CA"</code></a><code>]</code>. </p><p>For all options, see the <a href="https://faker.readthedocs.io/en/master/locales.html">Faker docs</a>.</p></td></tr></tbody></table>

**`parameters`**: A dictionary of DayZ parameters. Use this to set all the parameters that DayZ needs to create realistic data. Use the `create_parameters` function [described above](#create-parameters) and instantiate your DayZ synthesizer with it.

```python
from sdv.single_table import DayZSynthesizer

my_parameters = DayZSynthesizer.create_parameters(
  data=my_data,
  metadata=my_metadata,
  output_filename='dayz_parameters.json'
)

synthesizer = DayZSynthesizer(
    metadata,
    parameters=my_parameters,
    locales=['en_US', 'en_CA', 'fr_CA']
)
```

### Programmatic Parameters API <a href="#programmatic-parameters-api" id="programmatic-parameters-api"></a>

We recommend setting the parameters all at once. However, we also offer a programmatic, Python API to set the parameters one column at a time. Expand the sections below to learn more.

<details>

<summary><strong><code>set_numerical_bounds</code></strong></summary>

Use this method to set lower and upper bounds for numerical columns

**Parameters**&#x20;

* (required) `table_name`: A string with the name of the table
* (required) `column_name`: A string with the name of the column. This must be a numerical column referenced in your metadata.
* (required) `min_value`: A float or int representing the minimum value.
* (required) `max_value`: A float or int representing the max value

**Output** (None) The sampled synthetic data will follow the min and max bounds

```python
synthesizer.set_numerical_bounds(
    table_name='guests',
    column_name='room_rate',
    min_value=30.00,
    max_value=5000.00
)
```

</details>

<details>

<summary><strong><code>set_rounding_scheme</code></strong></summary>

Use this method to set the rounding scheme (# of decimal digits) for a numerical column.

**Parameters:**

* (required) `table_name`: A string with the name of the table
* (required) `column_name`: A string with the name of the column. This must be a numerical column referenced in your metadata.
* (required) `num_decimal_digits`: An integer that is >= 0, that specifies how to round the generated values
  * `0` means that the generated values should be whole numbers
  * Any higher number describes the # of digits to round. So `2` would mean rounding to 2 decimal digits (eg. `12.23`)

</details>

<details>

<summary><strong><code>set_datetime_bounds</code></strong></summary>

Use this method to set lower and upper bounds for datetime columns

**Parameters**&#x20;

* (required) `table_name`: A string with the name of the table
* (required) `column_name`: A string with the name of the column. This must be a datetime column referenced in your metadata.
* (required) `start_timestamp`: A string representing the earliest allowed datetime. The string must be in the same datetime format as referenced in your metadata.
* (required) `end_timestamp`: A string representing the latest allowed datetime. The string must be in the same datetime format as referenced in your metadata.

**Output** (None) The sampled synthetic data will follow start and end bounds

```python
synthesizer.set_datetime_bounds(
    table_name='guests',
    column_name='checkin_date',
    start_timestamp='01 Jan 2020',
    end_timestamp='31 Dec 2020'
)
```

</details>

<details>

<summary><strong><code>set_category_values</code></strong></summary>

Use this method to set the different values that are possible for categorical columns.

**Parameters**&#x20;

* (required) `table_name`: A string with the name of the table
* (required) `column_name`: A string with the name of the column. This must be a categorical column referenced in your metadata.
* (required) `category_values`: A list of strings representing the different unique category values that are possible.  (If missing values are allowed, use the *set\_missing\_values* method instead of listing it here.)

**Output** (None) The sampled synthetic data will include the category values

```python
synthesizer.set_category_values(
    table_name='guests',
    column_name='room_type',
    category_values=['BASIC', 'DELUXE', 'SUITE']
)
```

</details>

<details>

<summary><strong><code>set_missing_values</code></strong></summary>

Use this method to set the proportion of missing values to generate in a column

**Parameters**

* (required) `table_name`: A string representing the name of the table
* (required) `column_name`: A string representing the name of the column.  *This column cannot be a primary or foreign key.*
* (required) `missing_values_proportion`: A float representing the proportion of missing values
  * Any float between 0.0 and 1.0: Randomly create this proportion of missing values in the column

```python
synthesizer.set_missing_values(
    table_name='guests',
    column_name='room_type',
    missing_values_proportion=0.1
)
```

**Output** (None) Sets the proportion of the missing values

</details>

<details>

<summary><strong><code>set_cardinality</code></strong></summary>

Use this function to set the cardinality of a parent/child relationship. The *cardinality* refers to the number of children that each parent row is allowed to have. This can be anywhere from 0 to infinity.

This function can help you create realistic data for many relationship types such as 1-1, 1-to-many, etc.

```python
# each hotel must have 1 or more guests
synthesizer.set_cardinality(
    parent_table_name='hotels',
    child_table_name='guests',
    parent_primary_key='hotel_id',
    child_foreign_key='hotel_id',
    min_cardinality=1,
    max_cardinality=None
)
```

**Parameters**

* (required) `parent_table_name`: The name of the parent table
* (required) `child_table_name`: The name of the child table
* (required) `parent_primary_key`: The name of the primary key in the parent
* (required) `child_foreign_key`: The name of the foreign key in the child that refers to the primary key of the parent
* `min_cardinality`: The minimum # of children each parent must have, must be an integer >=0
  * (default) `0`: A parent row must have 0 or more children
  * `<integer>`: An integer representing the minimum # of children
* `max_cardinality`: The maximum # of children each parent must have, must be an integer >0
  * (default) `None`: Do not enforce a maximum (i.e. the maximum # of children can be infinite)
  * `<integer>`: An integer > `min_cardinality` representing the maximum # of children
  * *Note that If min cardinality = max cardinality, then that means there is a fixed # of children for each parent.*

**Output** (None) Sets the min and max cardinality of the parent/child relationship, or updates it if the cardinality was already set.&#x20;

</details>

<details>

<summary><strong><code>set_table_sizes</code></strong></summary>

Use this function to set the (relative) table sizes of each table in the dataset. When sampling, you can scale the entire dataset up or down.

```python
synthesizer = DayZSynthesizer(metadata)
synthesizer.set_table_sizes(
    num_rows_per_table={
        'hotels': 1000,
        'guests': 2500
    }
)
```

**Parameters:**&#x20;

* (required) `num_rows_per_table`: A dictionary containing the number of rows to set per table. The keys are the names of the tables, and the values are integers representing the number of rows.

</details>

### get\_parameters

Use this function to access the all parameters your synthesizer uses -- those you have provided as well as the default ones.

**Parameters**

* `output_filepath`: A string representing the name of the file to write the parameters to. We recommend storing this as a JSON file. Defaults to `None`, meaning that no output filepath is written.

**Output** A dictionary with the table names and parameters for each table.

{% hint style="info" %}
These parameters are only for the multi-table synthesizer. To get individual table-level parameters, use the `get_table_parameters` function.

The returned parameters are a copy. Changing them will not affect the synthesizer.
{% endhint %}

```python
synthesizer.get_parameters()
```

```python
{
    'locales': ['en_US', 'fr_CA'],
    ...
}
```

### get\_table\_parameters

Use this function to access the all parameters a table synthesizer uses -- those you have provided as well as the default ones.

**Parameters**

* (required) `table_name`: A string describing the name of the table

**Output** A dictionary with the parameter names and the values

```python
synthesizer.get_table_parameters(table_name='users')
```

```python
{
    'synthesizer_name': 'DayZSynthesizer',
    'synthesizer_parameters': {
        'columns': {
            ...
        }
    }
}
```

## Saving your synthesizer

Save your synthesizer for future use

### save

Use this function to save your synthesizer as a Python pickle file.

**Parameters**

* (required) `filepath`: A string describing the filepath where you want to save your synthesizer. Make sure this ends in `.pkl`&#x20;

**Output** (None) The file will be saved at the desired location

```python
synthesizer.save(
    filepath='my_synthesizer.pkl'
)
```

### load (utility function)

Use this utility function to load a trained synthesizer from a Python pickle file. After loading your synthesizer, you'll be able to sample synthetic data from it.

**Parameters**

* (required) `filepath`: A string describing the filepath of your saved synthesizer

**Output** Your synthesizer object

```python
from sdv.utils import load_synthesizer

synthesizer = load_synthesizer(
    filepath='my_synthesizer.pkl'
)
```

*This utility function works for any SDV synthesizer.*

## What's next? <a href="#whats-next" id="whats-next"></a>

After training your synthesizer, you can now sample synthetic data. See the [Sampling](https://docs.sdv.dev/sdv/multi-table-data/sampling) section for more details. *(This synthesizer does not yet offer support for conditional sampling.)*

```python
synthetic_data = synthesizer.sample(scale=1.0)
```

{% hint style="info" %}
**Want to improve your synthesizer?** Input logical rules in the form of constraints, and customize the transformations used for pre- and post-processing the data.

For more details, see [Customizations](https://docs.sdv.dev/sdv/single-table-data/modeling/customizations).
{% endhint %}
