* Performance Estimates
How well will SDV synthesizers be able to model your full data schema? Use this feature to get some estimates with only your metadata.
* create_and_test_multi_table
Simulate the performance of different multi-table synthesizers using your metadata.
This function uses the DayZSynthesizer to create random data. Then it runs the random data through the different multi-table synthesizers to estimate their performance, as well as the different evaluation reports.
from sdv.utils.multi_table import create_and_test_multi_table
create_and_test_multi_table(
metadata=my_metadata,
synthesizers=['HMASynthesizer', 'HSASynthesizer'],
output_folder='my_performance_results/',
default_num_rows=1_000_000,
timeout=3600 # 1 hour per synthesizer
)
Parameters:
(required)
metadata
: A Metadata object(required)
synthesizers
: A list of strings representing the multi-table synthesizers that you want to test. Options are:'HMASynthesizer'
,'HSASynthesizer'
or'IndependentSynthesizer'
(required)
output_folder
: A destination folder where the random data, results, and other artifacts will be saveddefault_num_rows
: An integer with the number of rows to create by default for all tables(default) 1000: Create 1000 rows for every table
num_rows_per_table
: A dictionary that maps each table name to the number of rows to create for only that table. Values here will override the default num rows set in the previous parameter(default)
None
: Do not override the default number of rows for any individual table
timeout
: The maximum number of seconds to give to each synthesizer to train and sample the dataset(default)
None
: Do not set a maximum. Allow the synthesizer to take as long as it needs.<integer>
: Allow a synthesizer to run on the integer number of seconds for each dataset. If the synthesizer is exceeding the time, the output will include a TimeoutError.
Output A pandas DataFrame with detailed performance results from each synthesizer
Interpreting the results
Your results include detailed timings for training, sampling, and evaluations.
synthesizer init_time preprocess_time fit_processed_time sample_time diagnostic_time diagnostic_score quality_time
DayZSynthesizer 0.0009 None None 1.23 None None None
HMASynthesizer 0.00098 12.34 456.789 234.567 1.23 1.0 234.12
HSASynthesizer 0.0008 12.45 34.566 23.456 1.25 1.0 239.45
Output folder
Your output folder contains the final results in results.csv
, the random DayZ data, as well as each diagnostic reports for each synthesizer.
my_performance_results/
|--- results.csv
|--- DayZ-Data/
|--- users.csv
|--- transactions.csv
|--- Diagnostic-Reports/
|--- hsa_diagnostic.pkl
|--- independent_diagnostic.pkl
...
FAQ
Last updated