Release Notes
This page provides detailed updates about each release, including the latest release.
Last updated
This page provides detailed updates about each release, including the latest release.
Last updated
This release enhances your ability to customize your synthesizer — whether it's through multi-table CAG patterns, single-table constraints, or pre-processing techniques that transform your data.
Improved CAG patterns. Use CarryOverColumns to supply a column that is repeated across many tables with different relationships. The PrimaryToPrimaryKeySubset pattern now works with missing values.
Bug fixes for single table constraints. Both the FixedNullCombinations and the MixedScales constraints can now be overlapped with other constraints.
💡 Experiment with new transformers. Try applying the new LogScaler and LogitScaler on data that exhibits exponential properties. You may find improvement in your synthetic data quality.
For more information, see the docs on customizing your synthesizer.
This release enhances existing SDV Enterprise features and fixes some bugs.
🌟 Enhanced segmentation in X Synthesis. You now have more control over modeling highly segmented data. When using the SegmentSynthesizer, you can now supply the exact columns to use to compute segments.
Explore your real data visually. Before modeling synthetic data, explore your real data by visualizing 1D and 2D plots. For more information, see our visualization options for single and multi-table data.
Bug fixes for AI Connectors. Thanks to your feedback, we've fixed some bugs in our AI Connectors bundle (in Beta) and clarified instructions for integrating to your database with user-based vs. service accounts.
In this release, we're continuing to add new features to SDV bundles. We also have a new-and-improved privacy metric.
Improved privacy measurements. Use the new DisclosureProtection metric to measure the privacy risk associated with disclosing (aka broadly sharing) your synthetic data. This metric comes with support for all statistical data types, baselines to help you interpret the score, and a performance optimization for large datasets.
Additional updates: Are you looking to upgrade your version of SDV Enterprise? You'll now see updated instructions for installing the basic SDV Enterprise library, as well as any bundles you've purchased. And if your plan includes all bundles, you can use a single command to download them all. For more information, see the Installation Instructions.
Our installation servers will continue to undergo maintenance in the coming months. Please bear with us as we update our systems and provide you new instructions. If you are running into any issues with installation, please reach out to us.
With this release, we are introducing SDV Bundles for the first time. SDV Enterprise users will have the option to purchase one or more bundles that take your synthetic data to the next level — whether it's more powerful modeling features, or adding complex, multi-table business logic. We've got you covered!
SDV Bundles are currently in Limited Availability. At this time, select SDV Enterprise users are able to use bundles and provide feedback. We are continuing to add features to the bundles and expand availability over time. If you'd like access to any of the bundles, please reach out to us.
Additional updates for SDV Enterprise users
Simulate performance of multi-table synthesizers on large datasets using only your metadata. Get detailed breakdowns over how different synthesizers will be able to preprocess, train, and sample synthetic data.
Evaluate the performance of multi-sequence data using SequenceLengthSimilarity and StatisticMSAS metrics.
This release addresses some bugs and adds a usability feature for accessing code scan results.
If you are on an older version of SDV Enterprise, or you'd like more information about the results, please Contact Us.
This release offers an improved metadata experience. Before, you would have to create separate objects based on your data (single table or multi table). Now, we offer a single, streamlined Metadata object for you to use anywhere in SDV.
Metadata is crucial part of using SDV.
🤔 What is metadata? Metadata is a description of your dataset. This includes the names of your tables and columns, the type of data in each column, and relationships between tables. You can start by auto-detecting metadata based on your data or database.
This release includes feature updates in all parts of your synthetic data workflow.
This patch release fixed some bugs you may have run into when applying constraints to null values, and corrected a warning that appeared when importing any module of the sdv library.
In this release, we're adding new pre-defined constraint constraint logic that you can apply to your synthesizers.
🔀 Fix the combination of null values. Prevent the synthetic data from permuting where the null values should occur. For example, you may require a set of columns to be null all together or not at all. The FixedNullCombinations constraint learns the possibilities from your real data to ensure the synthetic data is valid.
⚖️ Learn different scales for different types of rows. Allow SDV to learn that different scales are possible based on different segments of rows. For example, a segment that represents a patient's height (in inches) should have different results than segment represent a patient's blood pressure. The MixedScales constraint learns the possibilities from your real data to ensure the synthetic data is valid.
Additional updates: We're continuing to make performance improvements in synthesizers like DayZ. We've also optimized our SDV Enterprise delivery process to support aarch64/arm64 on Linux.
This release continues to add features that enhance your experience with synthetic data creation.
In this patch release, we've fixed some bugs related to constraints and library dependencies. We've also added some utility functions for early-stage, alpha testing.
This release expands the types of multi-table schemas that you're able to model, and continues to improve the quality of generated synthetic data.
🐞 Bug fixes and validation. We've additionally fixed some bugs that resulted confusing print-outs when SDV Enterprise encountered an error or warning.
In this release, we're continuing to add new features to directly connect your database. Use them to import real data and export synthetic data.
🗂️ Import multi-table data with referential integrity. When importing tables from a database, you can now pass in metadata. As a result, we'll ensure your imported data is fully valid with referential integrity across all your tables.
✅ Select only the database connectors you need. To keep the SDV Enterprise package to a manageable size, we've now made the database connectors optional. You can specify which ones you'd like to download during installation.
These features are still in Beta, so please be sure to try them out and provide feedback!
This month's release allows for a better experience getting started with SDV and creating more realistic data. We've also prioritized bug fixes that affected our SDV Enterprise customers.
↔️ Subset multi-table data while maintaining connections. If your multi-table dataset is too large, you can now use utility functions to sample a smaller subset for use with SDV. This feature ensures that the subsetting will maintain referential integrity -- aka valid connections between your tables.
Additional updates
Importing and exporting CSV data is also now more streamlined with the new CSVHandler
You can now use SDV with data where your column names are integers (0, 1, 2, ...)
We've fixed issues in the HSASynthesizer that caused it to print out many warnings during sampling, or a crash if you had too few rows.
In this patch release, we've added some new features in Beta for your testing. Try them out and let us know what you think!
🤝 [Beta!] Database connectors. SDV Enterprise users can now directly make a connection to their database to import real data into the SDV, and later export synthetic data back out to a new database. We're currently testing the Google's BigQuery database, with more options coming soon!
This release continues to provide you with the latest and greatest of synthetic data -- from Python compatibility, to data creation, to export.
Our policy is to support the active Python versions to the best of our ability. Active versions are pre-determined by the Python organization. For more information, see the official status of Python versions.
Additional updates
We've enhanced many of our synthesizers and customizations to proactively alert you of potential issues. For example, when loading pre-trained synthesizer into a different SDV version, adding unknown columns in a relationship, or specifying a datetime column without a format.
We've fixed a number of bugs in the HMASynthesizer that led to missing values, and a diagnostic score under 1.0
We've deprecated the SingleTablePreset (FAST ML) in favor of the GaussianCoupla. The GaussianCopula synthesizer is just as fast while offering higher a higher statistical quality.
This release covers a range of features that allow you to more easily get started with SDV and customize it for your needs.
Additional updates
We've improved our error messaging around invalid foreign keys, column relationships, and constraints to better help you understand the issues and debug them.
We've consolidated the way you can retrieve parameters from your synthesizers.
This release includes some highly requested new features, out for your experimentation and beta testing. Please let us know if you have any feedback!
Additional updates
We've improved the metadata auto-detection to carefully parse column names — and more accurately determine if something is PII.
We've fixed some bugs you may encounter when combining constraints with other types of sampling features.
This release includes some quality improves, bug fixes and sets up some foundations for future optimizations.
Additional updates
You can now opt to create random, anonymized data from any of the options in the Faker library. Even complex concepts such as currency.
The Inequality constraint now works correctly for datetime columns, especially with complex date formats.
The CTGANSynthesizer now works with the FixedCombinations constraint.
In this release, we're continuing to make improvements for metadata auto-detection and synthetic data evaluation.
Additional updates
The DayZSynthesizer can now create missing values within numerical and categorical columns
The DayZSynthesizer can now create discrete, category values from scratch for any type of data (not just strings)
This release provides features that make it easier for you to create highly realistic data and evaluate the synthetic data.
Additional updates
When plotting your data, you can choose from a variety of visualization options: displots, bar graphs, scatterplots, heatmaps and more.
We fixed a bug that caused incorrect datetimes (by +/- 1 day) in rare cases
In this release, we've made it even easier for you to get started. New features allow the SDV to auto-detect important elements from you real data when creating your metadata file.
An accurate metadata file is the key to high quality synthetic data. To see these features in action, check out the Metadata Creation Demo.
Additional Updates
Enterprise users can anonymize their metadata to obfuscate column and table names. This makes it easier to share your metadata if this information is sensitive.
You can now visualize the cardinality of a parent-child relationship. This allows you to see if the synthetic data is correctly capturing the # of children that each parent rows has.
This release expands the synthesizer options available to you, and improves your experience when evaluating your synthetic data.
Additional Updates
If you try to evaluate outlier coverage on data without any outliers, you'll now get a friendly error message
This smaller patch releases allows additional custom constraint functionality that may impact your projects.
This release improves the quality of your synthetic data and provides you with additional options for modeling your datasets.
Additional Updates
Support for Python 3.11. You can now use the SDV Enterprise with any of the currently active versions of Python (3.8-3.11).
Create and apply custom constraints for ID columns (such as primary keys) as well as PII columns (such as phone numbers).
Improved performance and progress tracking during quality evaluation. This release fixed bugs that led to incorrect scores, crashes and repeated warning messages.
In this release, we're adding features that improve synthetic data quality along with bug fixes and security recommendations.
Additional Updates
Dropping support for Python 3.7. This version has officially reached its end-of-life and the Python organization will no longer update its security. For your safety, we recommend upgrading Python to 3.8 or above.
Bug fixes for datetime columns. We fixed issues that you may have encountered if you stored datetime information as integers.
In this release, we're continuing to add new, enterprise features and prioritize bug fixes that affected you the most.
Additional Updates
For security and performance, we strive to keep up with the latest software releases. We've now upgraded the SDV to using the latest data science libraries such as pandas 2.0 and torch 2.0. For more information, see What's in the package?.
This release marks a major milestone the overall API usage and synthetic data workflows. New features include:
Additional Updates
We've also addressed a number of smaller issues including: The ability to load in multiple CSVs, modeling columns that are completely blank (null) and generating IDs with large regexes.
This releases improves performance and memory consumption in the HSA synthesizer
This release includes the new, proprietary HSA synthesizer, a multi-table model with improved performance that is available only to SDV Enterprise customers. It also includes a bug fix to improve the performance of sensitive data (PII).
This release adds support for Python 3.10 across all the supported platforms.
In this release, we are continuing to test our SDV Enterprise software on a variety of enterprise IT requirements to ensure that it can be installed correctly.
This is our first full SDV Enterprise release! In this release, our goal is to support easy, offline installation for a variety of enterprise IT requirements. Included features:
Package support for Linux and Windows, with fully offline installation
Synthetic data creation and evaluation features for demo datasets that are available offline
A dedicated SDV Enterprise site (this one!)
Brand new CAGs and CAG enhancements. We've added two new patterns to our CAG bundle — ForeignToPrimaryKeySubset and UniqueBridgeTable — bringing our total to 6 patterns. (More are coming soon!) You are also able to add any CAG pattern to the DayZSynthesizer, allowing you to create 100% valid data from scratch.
Additional AI Connectors. If you've purchased our AI Connectors bundle, you'll be able to connect to your Google Spanner database for importing and exporting your data. This connector now supports both Google Cloud SQL and PostgreSQL backends.
Constraint Augmented Generation (CAG). Take constraints to the next level. CAG is a new system that allows you to input business logic into your complex, multi-table schemas. For example, CAG allows you add CompositeKeys (and related relationships), and define specific rules about when data is allowed to be connected. For more information, visit the CAG Bundle page.
XSynthesis. Go the eXtra mile with your synthesis. This bundle includes enhanced synthesizers and transformers to improve your synthetic data quality and performance. For example, the XGCSynthesizer allows you to choose from 100+ column shapes, and the SegmentSynthesizer is optimized for highly segmented datasets. For more information, visit the XSynthesis Bundle page.
Check your package security. Starting from this version, you can easily access the IT security code scan results for your SDV Enterprise package. We follow the OWASP Top Ten industry standard, ensuring we identify and address any high severity issues before the release. For more information, see our page on IT Security.
Why is metadata important? All SDV synthesizers use metadata as the source-of-truth, especially if there are any ambiguities. (For example, should SDV treat a value like "94117" as a number, a category, or a postal code?) We strongly encourage you to inspect and update your metadata to be accurate – a little investment in metadata can go a long way in creating high quality synthetic data!
Improved metadata auto-detection. We've now improved the way we detect foreign keys from your data, leading to improvement of +23% points. We remain committed to improving metadata auto-detection. In the meantime, please continue to to verify your metadata and update it to accurately represent your dataset.
Modeling more types of data. The HSASynthesizer and IndependentSynthesizer now support modeling nullable foreign keys. This may designate that a particular row has no parent, or that a parent does not exist. The synthesizers now learn these patterns and faithfully recreate synthetic data with the same properties of null values.
Create richer data from scratch. Using the day DayZSynthesizer, you can now create realistic data that spans multiple columns. For example, latitude/longitude pairs representing GPS coordinates, and street address/city/country groups representing addresses. (Use column relationships to specify these columns in your metadata.)
Even more visualization options. You can now use the SDMetrics visualization features to explore just the real data — before any synthetic data creation even happens. This can allow you to identify important patterns in your real data ahead of time.
Prepare sequential data for modeling. Are you exploring a sequential dataset for synthetic data generation? We've now added utility functions to clean and simplify your data.
Enhanced database connectors. We're continuing to improve the import and export functionality from databases. We've fixed some bugs and added the abiliy to import from SQL views (virtual tables).
Synthesize more types of multi-table schemas. SDV Enterprise now supports multi-table schemas that are not fully connected — for example, you may have a table or two that are separate from other tables. Use a multi-table synthesizer, such as HSASynthesizer, to model it all.
GPS anonymization improvements. We've made updates to the way we synthesize latitude/longitude coordinate pairs. You'll now see more accurate GPS coordinates based on your provided noise (in kilometers). For more information, see the GPSNoiser.
Connect to your Microsoft SQL Server! You can now directly import/export data from your Microsoft SQL Server, which may be hosted on AWS or your own servers. More connectors are coming soon. For all options, see the docs.
[Beta!] Read & write from Excel sheets. If your data is available in a local, Excel file, you can now import it directly, and later export your synthetic data back into an Excel spreadsheet using the ExcelHandler.
Random ID generation. SDV Enterprise users now have access to a premium feature to completely randomize ID generation for primary or foreign keys. This makes your synthetic data look even more realistic than before.
Scrambling IDs. By popular demand, we now scramble any structured primary and foreign key IDs instead of generating them sequentially. This will help your data look more realistic. For more information, see the RegexGenerator.
Python 3.12 is here! You can now use SDV Enterprise (and any of the related SDV libraries) with Python 3.12. Upgrading your Python version to 3.12 will get you the latest security fixes and features from Python.
Set the min/max cardinality. If you are creating multi-table data from scratch using the DayZSynthesizer, you can now specify the min/max number of children each parent is allowed to have. Use this to get realistic data for logical relationships such as 1-to-1 or 1-to-many.
Save your synthetic data as CSVs. You can now export multi-table synthetic data to local CSV files using a single command. More import and export integrations are coming soon!
Clean your data to create referential integrity. If your real multi-table data contains missing or unknown references, you can now use SDV's utility functions to clean it up. SDV expects and guarantees referential integrity in the data -- real and synthetic.
Update metadata in bulk. High quality metadata makes for high quality synthetic data ... but what if it's taking too long to update your metadata? Use our new bulk update features to make changes faster, and get your synthetic data sooner.
Even more anonymization options. Control the amount of anonymized PII data you want to create in synthetic data, and whether it should repeat. Supply a cardinality rule to let SDV know whether to fake unique values or repeated anonymized data.
Accurately model GPS coordinates. Our newest updates allows you to more accurately synthesize data for latitude/longitude pairs. To get started, add a column relationship to your metadata to identify the pair of columns. SDV synthesizers will handle the rest! For more information, see the GPS transformers.
[In Beta!] Create metadata from a DDL file. Is your original data in a SQL database? We're beta testing a feature to automatically convert your DDL file to SDV metadata. Currently, we're supporting the IBM DB2 database, with more coming soon.
Improved Data Cardinality. With our new preprocessing methods, you'll now see improved data quality for parent/child cardinality in multi-table datasets. For example, if 40% of the real parents rows have 1 child while 60% have 2+ children, then the same will be true in the synthetic data.
Specify Column Relationships. Do you have multiple columns that encode the same concept? Now, you can specify these column relationships directly in the metadata. In this release, you can annotate when multiple columns together represent a single physical mailing address. More concepts are coming soon!
Foundational Codebase Changes. Behind-the-scenes, we've been making a few changes to the way our codebase is compiled. There are no changes to the Python API — this is just to help us make optimizations in the future. Please feel free to retry our existing features and let us know if you notice any issues!
[Beta!] Auto-Detect PII Columns. As part of our continued improvements for Metadata Auto-Detection, we've started to detect some basic, PII concepts. If your column names contain certain keywords such as email
or phone_number
, we'll match them up to the correct PII types.
Improved Data Diagnostic. The all-new and improved diagnostic is more targeted at diagnosing issues with your synthetic data. Based on common application requirements, we've incorporated validity checks that are important to you: Referential integrity, min/max values, and more. For more information, see the Diagnostic Report API.
[Beta!] Generate realistic addresses worldwide. You can now identify a group of columns that, together, identify a physical/mailing address. The SDV ensures that the address data is consistent throughout the columns, pointing to realistic locations. For more information, see the address API.
Compute intertable trends. For multi-table datasets, the quality reports includes correlations between different tables, providing more insight into the quality of your primary/foreign key connections. For more information, see the Quality Report API.
Detect primary and foreign keys. With the new updates, the SDV looks for primary keys in your tables, as well as potential foreign keys that may connect two tables. This will help you discover the overall structure of your schema.
Detect datetime formats. The SDV can now infer datetime values from you columns and determine the format string.
Discover categorical columns. Do you have discrete categories that are encoded using numbers? The SDV will detect these columns to improve your synthetic data quality.
Calling attention to unknown columns. The metadata detection is not perfect. The SDV will call out unknown columns for you to verify. By default, we'll treat unknown columns as PII, ensuring that you do not accidentally leak any PII information.
Comprehensive, robust reporting. Use the Quality Report to get insights into how your synthetic data compares to the real data. You'll now see an updated progress bar, clearer error messages and support for a wider variety of schemas.
Additional support for schemas. The IndepedentSynthesizer now supports more complex multi-table schemas, such as multiple connections between tables.
You can now apply custom constraints to PII columns (such as addresses, phone numbers, etc.) or even to a combination of PII with other column types.
You can now use the IDGenerator to generate an index-based key, such as a primary key
A new, faster multi-table synthesizer: The IndependentSynthesizer is our fastest synthesizer yet! Use it for modeling an unlimited number of tables in complex configurations. (For a full list of features and tradeoffs, see the SDV Synthesizer Guide.)
Contextual anonymization for phone numbers and emails. Understand the deeper meaning behind phone number and email data to anonymize the PII in a hyper-realistic way. For eg., match general geographical regions while obfuscating the precise, sensitive information.
Evaluate outliers in your synthetic data. The previous release allowed you to model rare events. In this release, you can apply metrics to quantify the results and guard against common failure modes. See metrics for OutlierCoverage and SmoothnessSimilarity.
Improve synthetic data cardinality. The HSASynthesizer will now create synthetic data that conforms to the real multi-table patterns. For example, you can synthesize data with an exact 1-1 relationship between entities. Another common case is when a parent entity must have at least 1 child.
Model and recreate rare events. You now have more options to synthesize rare events that closely resemble the real data. Use the UniformEncoder to capture imbalanced categories and the OutlierEncoder to identify outlier data points.
Anonymize PII more realistically. In your real data, some records may have empty or missing PII attributes. You synthetic data will now include missing values in the correct proportions, leading to more realistic anonymization.
Introducing the Chained Inequality constraint. Use the new Chained Inequality constraint when multiple columns follow an order. For example, purchase_date < start_date < end_date < expiration_date
. This constraint is only available to licensed users. (We've also fixed some bugs in the publicly available Inequality and Range constraints.)
Generate times from scratch. You can now use the DayZSynthesizer to generate times without any attached dates. (For example, you can generate 12:06 PM
without a specific day.)
Bug fixes in the Diagnostic Report. Some of you reported a ValueError
when running the Diagnostic report with your data. We've fixed the related bugs in the Synthesis property.
Easy, automatic metadata creation and validation. Use the new Python API to write a metadata JSON description. Automatically detect it from real data and validate it to make sure it's accurate.
Pause, inspect and customize your synthetic data workflow. Fine-tune your synthesizer to optimize data quality. Customize and view the data preprocessing to make your synthetic data project successful.
Generate synthetic data from scratch. Are you still waiting for access to the real data? No problem. Use the Day Z Synthesizer to supply your metadata and create realistic synthetic data on day zero — no real data or machine learning required!