* DayZSynthesizer
Last updated
Last updated
The Day Z Synthesizer produces synthetic data from scratch using only the metadata. This allows you start generating synthetic data from day zero: no real data or machine learning required!
When creating your synthesizer, you are required to pass in a Metadata object as the first argument. Other parameters are optional.
locales
: A list of locale strings. Any PII columns will correspond to the locales that you provide.
(default) ['en_US']
Generate PII values in English corresponding to US-based concepts (eg. addresses, phone numbers, etc.)
<list>
Create data from the list of locales. Each locale string consists of a 2-character code for the language and 2-character code for the country, separated by an underscore.
By default, this synthesizer will randomly generate data that conforms to your metadata specification. If you'd like to generate more realistic data, you can use the methods below to add guidance.
Use this method to set lower and upper bounds for numerical columns
Parameters
(required) column_name
: A string with the name of the column. This must be a numerical column referenced in your metadata.
(required) min_value
: A float or int representing the minimum value.
(required) max_value
: A float or int representing the max value
Output (None) The sampled synthetic data will follow the min and max bounds
Use this method to set lower and upper bounds for datetime columns
Parameters
(required) column_name
: A string with the name of the column. This must be a datetime column referenced in your metadata.
(required) start_timestamp
: A string representing the earliest allowed datetime. The string must be in the same datetime format as referenced in your metadata.
(required) end_timestamp
: A string representing the latest allowed datetime. The string must be in the same datetime format as referenced in your metadata.
Output (None) The sampled synthetic data will follow start and end bounds
Use this method to set the different values that are possible for categorical columns.
Parameters
(required) column_name
: A string with the name of the column. This must be a categorical column referenced in your metadata.
(required) category_values
: A list of strings representing the different unique category values that are possible. (If missing values are allowed, use the set_missing_values method instead of listing it here.)
Output (None) The sampled synthetic data will include the category values
Use this method to set the proportion of missing values to generate in a column
Parameters
(required) column_name
: A string representing the name of the column. This column cannot be a primary or foreign key.
(required) missing_values_proportion
: A float representing the proportion of missing values
Any float between 0.0 and 1.0: Randomly create this proportion of missing values in the column
Use this method to get a dictionary of all the parameters used to make synthetic data -- those you have provided as well as the default ones.
Parameters
output_filepath
: A string representing the name of the file to write the parameters to. We recommend storing this as a JSON file.
Defaults to None
, meaning that no output filepath is written.
Output A dictionary representing all the custom parameters added to the synthesizer.
Save your synthesizer for future use
Use this function to save your synthesizer as a Python pickle file.
Parameters
(required) filepath
: A string describing the filepath where you want to save your synthesizer. Make sure this ends in .pkl
Output (None) The file will be saved at the desired location
Use this function to load a synthesizer from a Python pickle file
Parameters
(required) filepath
: A string describing the filepath of your saved synthesizer
Output Your synthesizer, as a DayZSynthesizer object
Sample any amount of synthetic data
Use this method to sample synthetic data
Parameters
(required) num_rows
: An integer >0 that specifies the number of rows to synthesize
Output A pandas DataFrame object with synthetic data
This synthesizer has limited functionality. It is not compatible with conditional sampling or constraints
If you wish to use these features, we recommend using real data and machine learning to train a GaussianCopulaSynthesizer.
For example [
,
]
.
For all options, see the .
*SDV Enterprise Feature. This feature is only available for licensed, enterprise users. For more information, visit our page to Explore SDV.