* DayZSynthesizer
The Day Z Synthesizer produces synthetic data from scratch using the metadata. This allows you start generating synthetic data from day zero: no machine learning required!
from sdv.single_table import DayZSynthesizer
synthesizer = DayZSynthesizer(metadata)
synthetic_data = synthesizer.sample(num_rows=10)Estimate parameters
For more realistic data, we recommend estimating some basic DayZ parameters using the real data. This includes information such as the min/max range of numerical columns and the possible category values in categorical columns.
SDV Community users can complete this step. You may be asked share the DayZ parameters file to the SDV team for help in performance testing or debugging.
Create Parameters
Use the create_parameters function to estimate the parameters and save them as a JSON file.
from sdv.single_table import DayZSynthesizer
my_parameters = DayZSynthesizer.create_parameters(
data=my_data,
metadata=my_metadata,
output_filename='dayz_parameters.json'
)Parameters:
(required)
data: A pd.DataFrame object containing the data to use for estimating parameters(required)
metadata: A SDV Metadata object that describes the dataoutput_filepath: A string with the name of the file in which to save the parameters. This should end in a.jsonsuffix.
Returns: A Python dictionary representation of the parameters (that are also saved in the JSON).
Validate Parameters
Use the validate_parameters to validate that the parameters accurately reflect the metadata. This is important if you've modified any of the parameters in the file.
DayZSynthesizer.validate_parameters(
metadata=my_metadata,
parameters=my_parameters
)Parameters:
(required)
metadata: An SDV Metadata object that describes the data(required)
parameters: The parameters dictionary
Returns: (None) If there are any issues with the parameters, you'll see an error.
Creating a synthesizer
When creating your synthesizer, you are required to pass in a Metadata object as the first argument. We also recommend setting the parameters at this time.
synthesizer = DayZSynthesizer(
metadata,
parameters=my_parameters,
locales=['en_US', 'en_CA', 'fr_CA']
)Parameter Reference
locales: A list of locale strings. Any PII columns will correspond to the locales that you provide.
(default) ['en_US']
Generate PII values in English corresponding to US-based concepts (eg. addresses, phone numbers, etc.)
<list>
Create data from the list of locales. Each locale string consists of a 2-character code for the language and 2-character code for the country, separated by an underscore.
For example ["en_US", "fr_CA"].
For all options, see the Faker docs.
parameters: A dictionary of DayZ parameters. Use this to set all the parameters that DayZ needs to create realistic data. Use the create_parameters function described above and instantiate your DayZ synthesizer with it.
from sdv.single_table import DayZSynthesizer
my_parameters = DayZSynthesizer.create_parameters(
data=my_data,
metadata=my_metadata,
output_filename='dayz_parameters.json'
)
synthesizer = DayZSynthesizer(
metadata,
parameters=my_parameters,
locales=['en_US', 'en_CA', 'fr_CA']
)Programmatic Parameters API
We recommend setting the parameters all at once. However, we also offer a programmatic, Python API to set the parameters one column at a time. Expand the sections below to learn more.
get_parameters
Use this method to get a dictionary of all the parameters used to make synthetic data -- those you have provided as well as the default ones.
Parameters
output_filepath: A string representing the name of the file to write the parameters to. We recommend storing this as a JSON file. Defaults toNone, meaning that no output filepath is written.
Output A dictionary representing all the parameters the synthesizer uses to generate data.
{
'locales': ['en_US', 'en_CA', 'fr_CA'],
'columns': {
'room_rate': {
'min_value': 30.00,
'max_value': 500.00
},
'checkin_date': {
'start_timestamp':'01 Jan 2020',
'end_timestamp':'31 Dec 2020'
},
'room_type': {
'category_values': ['BASIC', 'DELUXE', 'SUITE'],
'missing_values_proportion': 0.1
}
},
...
}Saving your synthesizer
Save your synthesizer for future use
save
Use this function to save your synthesizer as a Python pickle file.
Parameters
(required)
filepath: A string describing the filepath where you want to save your synthesizer. Make sure this ends in.pkl
Output (None) The file will be saved at the desired location
synthesizer.save(
filepath='my_synthesizer.pkl'
)load (utility function)
Use this utility function to load a trained synthesizer from a Python pickle file. After loading your synthesizer, you'll be able to sample synthetic data from it.
Parameters
(required)
filepath: A string describing the filepath of your saved synthesizer
Output Your synthesizer object
from sdv.utils import load_synthesizer
synthesizer = load_synthesizer(
filepath='my_synthesizer.pkl'
)This utility function works for any SDV synthesizer.
What's next?
After training your synthesizer, you can now sample synthetic data. See the Sampling section for more details.
synthetic_data = synthesizer.sample(num_rows=10)Last updated