❖ Targeted Sampling
The Targeted Sampling bundle allows you to create synthetic data that is specific and targeted to your usage. You can use targeted sampling to:
⭐ Create minimum viable synthetic test sets for software testing and QA. Define the exact values that you need across multiple tables, and let SDV handle the rest.
⭐ De-bias and rebalance your datasets. Adjust the proportions of values to generate in the synthetic data to create a more balanced, fairer representation of the data you'd like to use for ML development.
⭐ Generate hypothetical scenarios. Create edge cases and situations that don't exist in the real data — but are theoretically possible to observe.
Included Features
This functionality is in Beta. At this time, select SDV Enterprise users are able to use this feature and provide feedback.
The Targeted Sampling bundle currently supports multi-table conditional sampling for the HSA and Independent Synthesizers.
from sdv.sampling import Condition, MultiTableCondition
# Step 1: Create Single-Table Conditions
resort_hotels = Condition(
num_rows=10,
table_name='hotels',
column_values={'classification': 'RESORT'})
suite_guests_with_rewards = Condition(
table_name='guests'
column_values={'room_type': 'SUITE', 'has_rewards': True})
# Step 2: Compose Multi-Table Conditions
suites_in_resorts = MultiTableCondition(
conditions=[resort_hotels, suite_guests_with_rewards])
# Step 3: Sample Synthetic Data
synthetic_data = synthesizer.sample_from_conditions([suite_guests_with_rewards])
Installation
Use your SDV Enterprise credentials to install SDV Enterprise and all bundles that you have access to.
% pip install sdv-installer --upgrade
% sdv-installer install --upgrade
Username: <email>
License Key: ********************************
Installing SDV Enterprise:
sdv-enterprise (version 0.30.0) - Installed!
Installing Bundles:
bundle-cag - Installed!
bundle-xsynthesizers - Installed!
Success! All packages have been installed. You are ready to use SDV Enterprise.
Last updated