*IndependentSynthesizer
The Independent Synthesizer learns each table's patterns independently. This synthesizer offers fast performance for unlimited tables.
from sdv.multi_table import IndependentSynthesizer
synthesizer = IndependentSynthesizer(metadata)
synthesizer.fit(data)
synthetic_data = synthesizer.sample()
Creating a synthesizer
When creating your synthesizer, you are required to pass in a Multi Table Metadata object as the first argument.
synthesizer = IndependentSynthesizer(metadata)
All other parameters are optional. You can include them to customize the synthesizer.
Parameter Reference
locales
: A list of locale strings. Any PII columns will correspond to the locales that you provide.
(default) ['en_US']
Generate PII values in English corresponding to US-based concepts (eg. addresses, phone numbers, etc.)
<list>
Create data from the list of locales. Each locale string consists of a 2-character code for the language and 2-character code for the country, separated by an underscore.
For example [
"en_US"
,
"fr_CA"
]
.
For all options, see the Faker docs.
synthesizer = IndependentSynthesizer(
metadata,
locales=['en_US', 'en_CA', 'fr_CA']
)
set_table_parameters
The Independent Synthesizer models each individual table. You can get and set the parameters for each table.
Parameters
(required)
table_name
: A string describing the name of the tabletable_synthesizer
: The single table synthesizer to use for modeling the table(default)
'GaussianCopulaSynthesizer'
: Use the GaussianCopulaSynthesizer to model the single tableOther available options:
'GaussianCopulaSynthesizer'
,'CTGANSynthesizer'
,'TVAESynthesizer'
,'CopulaGANSynthesizer'
. For more information, see Single Table Synthesizers.
table_parameters
: A dictionary mapping the name of the parameter (string) to the value of the parameter (various). These parameters are different for each synthesizer. For more information, see Single Table Synthesizers.
Output (None)
synthesizer.set_table_parameters(
table_name='guests',
table_synthesizer='GaussianCopulaSynthesizer',
table_parameters={
'enforce_min_max_values': True,
'default_distribution': 'truncnorm',
'numerical_distributions': {
'checkin_date': 'uniform',
'amenities_fee': 'beta'
}
}
)
get_parameters
Use this function to access the all parameters your synthesizer uses -- those you have provided as well as the default ones.
Parameters (None)
Output A dictionary with the table names and parameters for each table.
synthesizer.get_parameters()
{
'locales': ['en_US', 'fr_CA'],
...
}
get_table_parameters
Use this function to access the all parameters a table synthesizer uses -- those you have provided as well as the default ones.
Parameters
(required)
table_name
: A string describing the name of the table
Output A dictionary with the parameter names and the values
synthesizer.get_table_parameters(table_name='users')
{
'synthesizer_name': 'GaussianCopulaSynthesizer',
'synthesizer_parameters': {
'default_distribution': 'beta',
...
}
}
get_metadata
Use this function to access the metadata object that you have included for the synthesizer
Parameters None
Output A MultiTableMetadata object
metadata = synthesizer.get_metadata()
Learning from your data
To learn a machine learning model based on your real data, use the fit
method.
fit
Parameters
(required)
data
: A dictionary mapping each table name to a pandas DataFrame containing the real data that the machine learning model will learn from
Output (None)
get_learned_distributions
After fitting this synthesizer, you can access the marginal distributions that were learned to estimate the shape of each column.
Parameters
(required)
table_name
: A string with the name of the table
Output A dictionary that maps the name of each learned column to the distribution that estimates its shape
synthesizer.get_learned_distributions(table_name='guests')
{
'amenities_fee': {
'distribution': 'beta',
'learned_parameters': { 'a': 2.22, 'b': 3.17, 'loc': 0.07, 'scale': 48.5 }
},
'checkin_date': {
...
},
...
}
For more information about the distributions and their parameters, visit the Copulas library.
get_loss_values
After fitting, you can access the loss values computed during each epoch for both the numerator and denominator.
Parameters
(required)
table_name
: A string with the name of the table
Output A pandas.DataFrame object containing epoch number, generator loss value and discriminator loss value.
synthesizer.get_loss_values(table_name='users')
Epoch Generator Loss Discriminator Loss
1 1.7863 -0.3639
2 1.5484 0.2260
3 1.3633 -0.0441
...
Saving your synthesizer
Save your trained synthesizer for future use.
save
Use this function to save your trained synthesizer as a Python pickle file.
Parameters
(required)
filepath
: A string describing the filepath where you want to save your synthesizer. Make sure this ends in.pkl
Output (None) The file will be saved at the desired location
synthesizer.save(
filepath='my_synthesizer.pkl'
)
IndependentSynthesizer.load
Use this function to load a trained synthesizer from a Python pickle file
Parameters
(required)
filepath
: A string describing the filepath of your saved synthesizer
Output Your synthesizer, as a HMASynthesizer object
from sdv.multi_table import IndependentSynthesizer
synthesizer = IndependentSynthesizer.load(
filepath='my_synthesizer.pkl'
)
What's next?
After training your synthesizer, you can now sample synthetic data. See the Sampling section for more details.
FAQs
Last updated