Release Notes

This page provides detailed updates about each release, including the latest release.

Version 0.31.0 (July 15, 2025)

🏹 Conditional sampling for data with DayZSynthesizer. Use the DayZSynthesizer to create data from scratch. Now, for single-table use cases, you can sample synthetic data by providing fixed conditions. Use this feature to enforce specific values and ratios within your data.

🌐 Automatic timezone extraction. SDV Enterprise users will automatically see the highest quality data when you have timezones. Synthesizers will, by default, extract the timezone information from your data, learn its patterns, and faithfully recreate it when sampling synthetic data.

🐞 Bug fix when using constraints on multi-table data. We've now fixed a bug that caused a synthesizer to crash when applying multiple, overlapping constraints to multi-table data. For example, many different Inequality constraints that overlapped on the same set of columns.

Version 0.30.0 (July 1, 2025)

🔐 Verify differential privacy using SDVerified. Our new, SDVerified feature allows you to measure the differential privacy of your synthesizer, and record the fact that you've verified it. Use this for any differential privacy synthesizer (although you can run the evaluation on its own for any other synthesizer too).

🌐 Extract timezones from your data. SDV synthesizers use the UnixTimestampEncoder to convert datetimes into numerical values. Now, you can also extract the timezone so that your model can learn it as a separate feature. This is especially useful for learning and creating synthetic data that has a mix of different timezones.

✅ Apply constraints when creating data from scratch. You can now apply any single-table constraint — including your own programmable constraint — to the DayZSynthesizer. This allows you to create random data from scratch, while still adhering to your business logic.

Version 0.29.0 (June 17, 2025)

🔑 Support for composite keys in complex schemas. In this release, we've updated the API for the CompositeKeys constraint. Now, you can supply all the composite primary keys and foreign keys that are in your schema in one go. The constraint is now able to handle complex schemas, where composite primary and foreign keys may overlap across multiple columns. Give it a try and let us know what you think!

🌳 Visualize hierarchical structures in your data. If your data has a SelfReferentialHierarchy constraint, you can now visualize the hierarchy in a tree-like dependency structure. This allows you to better explore your real data and verify that your synthetic data is following the constraint.

🐞 Improved data quality for Copulas. We've fixed a rare edge case in GaussianCopulaSynthesizer that used to result in a poor estimation of a column's shape. Starting from this release, SDV will proactively catch the edge case and make a fix for higher data quality.

Version 0.28.0 (June 3, 2025)

🌟 A simpler, streamlined constraint experience for single and multi-table. Starting from this release, all single-table and multi-table constraints use our constraint-augmented generation framework (CAG). This means:

You can create and input single and multi-table constraints all at once to your synthesizer, using the same API
You can now program you own single and multi-table constraints for your synthesizer. The programmability is more flexible than before, allowing you to learn parameters from the real data, and fix any incorrect data.

👢 Bootstrap your data with few-shot learning. If you only have a few rows of training data — or if your data is "short and wide" (more columns than rows) — use the BootstrapSynthesizer to optimize the training experience. This synthesizer bootstraps your training data before learning patterns, and it's compatible with any SDV single-table synthesizer. It is available in the XSynthesizers Bundle.

✅ Verify the differential privacy of a synthesizer. With our new DP verification tool, you can now empirically verify the differential privacy guarantees that your synthesizer provides. This tool is available in the Differential Privacy Bundle and is compatible with any SDV single-table synthesizer.

Version 0.27.0 (May 20, 2025)

🕓 Synthesize datetimes with timezone information. All SDV synthesizers are now compatible with datetime columns that have a consistent timezone. Your synthetic data will include the same timezone as the original.

🔢 Scale up Regexes and IDs. You can now create new Regexes and ID values that grow proportionally with the size of your synthetic dataset. For example, 25 unique ID values per 100 rows of data. For more information, see the cardinality_rule parameter in the RegexGenerator and AnonymizedFaker.

📖 Read & write CSVs with more options. The CSVHandler now accepts more parameters when reading your real CSV data, and writing your synthetic data back into a CSV. You can specify settings such as the character encoding, escape characters, and more.

Additional updates: We've added more functions to the Metadata API that allow you to programmatically modify your metadata. We've also enabled SDV to be installed and used within a readonly filesystem.

Version 0.26.0 (May 6, 2025)

🎉 Announcing Differential Privacy in SDV. The new differential privacy bundle includes synthesizers that you can use to fit and sample any amount of synthetic data — all with guarantees of differential privacy. It also includes many new transformers for noising and normalizing columns of data in a differentially private way.

🔑 Connect foreign keys that describe the same concept. The ForeignToForeignKey constraint is now available in the CAG bundle. This constraint allows you to connect foreign keys from multiple tables. You may run into this if you have data from multiple domains that is linked together by the same concept.

Additional updates:

We've also fixed some bugs in the metadata auto-detection feature for multi-table data. The metadata now checks that foreign key is not reused for multiple relationships.
You'll now be able to see the expiry date for your currently-installed SDV Enterprise package. For details, see the IT Security page.

Version 0.25.0 (April 15, 2025)

✨ Use SDV Enterprise with Python 3.13. You can now use all SDV Enterprise software with the latest Python 3.13. We recommend upgrading your Python version for most up-to-date security and performance enhancements. For more information, see the SDV Enterprise Technical Requirements.

🔐 Measure the overall privacy of a synthetic data table. We've introduced two new privacy metrics based on the distance-to-closest-record (DCR). These metrics — DCRBaselineProtection and DCROverfittingProtection — give you a holistic privacy score for an entire table of data.

🏷️ Improved metadata auto-detection. We've enhanced the metadata detection algorithm to better auto-detect columns that represent IDs, whether they are primary keys, foreign keys, or just a regular column. The new algorithm also includes better defaults, so you won't see any unknown or generic PII values showing up in your metadata or synthetic data.

Version 0.24.0 (March 18, 2025)

This release continues to add features for higher quality synthetic data generation and ease-of-use.

🌟 Model hierarchical relationships in a table. Use the SelfReferentialHierarchy CAG pattern when you have a column in a table that references the same table. This represents a hierarchical relationship between the rows.

📦 Program your synthesizers with bulk updates. Update the data preprocessing for many columns at once using our bulk update function. This is compatible with any of the preprocessing transformers in the RDT library.

♻️ Rewrite your metadata file as you update it. Accurate metadata is crucial for high quality synthetic data. As you inspect and update your metadata, you can now rewrite it back to the same file so it's ready-to-use in your synthesizer.

Additional updates:

Starting from this release, the X Synthesis bundle has now been renamed to X Synthesizers. This bundle contains the same great features that allow you to take your synthetic data modeling to the next level. Look out for additional features soon!

Version 0.23.0 (February 16, 2025)

This release enhances your ability to customize your synthesizer — whether it's through multi-table CAG patterns, single-table constraints, or pre-processing techniques that transform your data.

🏆 Improved constraints in CAG. Use CarryOverColumns to supply a column that is repeated across many tables with different relationships. The PrimaryToPrimaryKeySubset constraint now works with missing values.

🐞 Bug fixes for single table constraints. Both the FixedNullCombinations and the MixedScales constraints can now be overlapped with other constraints.

💡 Experiment with new transformers. Try applying the new LogScaler and LogitScaler on data that exhibits exponential properties. You may find improvement in your synthetic data quality.

For more information, see the docs on customizing your synthesizer.

Version 0.22.0 (January 21, 2025)

This release enhances existing SDV Enterprise features and fixes some bugs.

🌟 Enhanced segmentation in X Synthesis. You now have more control over modeling highly segmented data. When using the SegmentSynthesizer, you can now supply the exact columns to use to compute segments.

👀 Explore your real data visually. Before modeling synthetic data, explore your real data by visualizing 1D and 2D plots. For more information, see our visualization options for single and multi-table data.

🐞 Bug fixes for AI Connectors. Thanks to your feedback, we've fixed some bugs in our AI Connectors bundle (in Beta) and clarified instructions for integrating to your database with user-based vs. service accounts.

Version 0.21.0 (December 17, 2024)

In this release, we're continuing to add new features to SDV bundles. We also have a new-and-improved privacy metric.

🔐 Improved privacy measurements. Use the new DisclosureProtection metric to measure the privacy risk associated with disclosing (aka broadly sharing) your synthetic data. This metric comes with support for all statistical data types, baselines to help you interpret the score, and a performance optimization for large datasets.

📊 Brand new constraints and CAG enhancements. We've added two new constraints to our CAG bundle — ForeignToPrimaryKeySubset and UniqueBridgeTable — bringing our total to 6 constraints. (More are coming soon!) You are also able to add any CAG constraint to the DayZSynthesizer, allowing you to create 100% valid data from scratch.

🔢 Additional AI Connectors. If you've purchased our AI Connectors bundle, you'll be able to connect to your Google Spanner database for importing and exporting your data. This connector now supports both Google Cloud SQL and PostgreSQL backends.

Additional updates: Are you looking to upgrade your version of SDV Enterprise? You'll now see updated instructions for installing the basic SDV Enterprise library, as well as any bundles you've purchased. And if your plan includes all bundles, you can use a single command to download them all. For more information, see the Installation Instructions.

Our installation servers will continue to undergo maintenance in the coming months. Please bear with us as we update our systems and provide you new instructions. If you are running into any issues with installation, please reach out to us.

Version 0.20.0 (November 19, 2024)

With this release, we are introducing SDV Bundles for the first time. SDV Enterprise users will have the option to purchase one or more bundles that take your synthetic data to the next level — whether it's more powerful modeling features, or adding complex, multi-table business logic. We've got you covered!

🌟 Constraint Augmented Generation (CAG). Take constraints to the next level. CAG is a new system that allows you to input business logic into your complex, multi-table schemas. For example, CAG allows you add CompositeKeys (and related relationships), and define specific rules about when data is allowed to be connected. For more information, visit the CAG Bundle page.

🌟 XSynthesis. Go the eXtra mile with your synthesis. This bundle includes enhanced synthesizers and transformers to improve your synthetic data quality and performance. For example, the XGCSynthesizer allows you to choose from 100+ column shapes, and the SegmentSynthesizer is optimized for highly segmented datasets. For more information, visit the XSynthesis Bundle page.

SDV Bundles are currently in Limited Availability. At this time, select SDV Enterprise users are able to use bundles and provide feedback. We are continuing to add features to the bundles and expand availability over time. If you'd like access to any of the bundles, please reach out to us.

Additional updates for SDV Enterprise users

Simulate performance of multi-table synthesizers on large datasets using only your metadata. Get detailed breakdowns over how different synthesizers will be able to preprocess, train, and sample synthetic data.
Evaluate the performance of multi-sequence data using SequenceLengthSimilarity and StatisticMSAS metrics.

Version 0.19.0 (October 15, 2024)

This release addresses some bugs and adds a usability feature for accessing code scan results.

✅ Check your package security. Starting from this version, you can easily access the IT security code scan results for your SDV Enterprise package. We follow the OWASP Top Ten industry standard, ensuring we identify and address any high severity issues before the release. For more information, see our page on IT Security.

If you are on an older version of SDV Enterprise, or you'd like more information about the results, please Contact Us.

Version 0.18.0 (October 3, 2024)

This release offers an improved metadata experience. Before, you would have to create separate objects based on your data (single table or multi table). Now, we offer a single, streamlined Metadata object for you to use anywhere in SDV.

Metadata is crucial part of using SDV.

🤔 What is metadata? Metadata is a description of your dataset. This includes the names of your tables and columns, the type of data in each column, and relationships between tables. You can start by auto-detecting metadata based on your data or database.

⭐ Why is metadata important? All SDV synthesizers use metadata as the source-of-truth, especially if there are any ambiguities. (For example, should SDV treat a value like "94117" as a number, a category, or a postal code?) We strongly encourage you to inspect and update your metadata to be accurate – a little investment in metadata can go a long way in creating high quality synthetic data!

Version 0.17.0 (September 17, 2024)

This release includes feature updates in all parts of your synthetic data workflow.

🎯 Improved metadata auto-detection. We've now improved the way we detect foreign keys from your data, leading to improvement of +23% points. We remain committed to improving metadata auto-detection. In the meantime, please continue to to verify your metadata and update it to accurately represent your dataset.

🔑 Modeling more types of data. The HSASynthesizer and IndependentSynthesizer now support modeling nullable foreign keys. This may designate that a particular row has no parent, or that a parent does not exist. The synthesizers now learn these patterns and faithfully recreate synthetic data with the same properties of null values.

🌎 Create richer data from scratch. Using the day DayZSynthesizer, you can now create realistic data that spans multiple columns. For example, latitude/longitude pairs representing GPS coordinates, and street address/city/country groups representing addresses. (Use column relationships to specify these columns in your metadata.)

Version 0.16.1 (September 3, 2024)

This patch release fixed some bugs you may have run into when applying constraints to null values, and corrected a warning that appeared when importing any module of the sdv library.

Version 0.16.0 (August 16, 2024)

In this release, we're adding new pre-defined constraint constraint logic that you can apply to your synthesizers.

🔀 Fix the combination of null values. Prevent the synthetic data from permuting where the null values should occur. For example, you may require a set of columns to be null all together or not at all. The FixedNullCombinations constraint learns the possibilities from your real data to ensure the synthetic data is valid.

⚖️ Learn different scales for different types of rows. Allow SDV to learn that different scales are possible based on different segments of rows. For example, a segment that represents a patient's height (in inches) should have different results than segment represent a patient's blood pressure. The MixedScales constraint learns the possibilities from your real data to ensure the synthetic data is valid.

Additional updates: We're continuing to make performance improvements in synthesizers like DayZ. We've also optimized our SDV Enterprise delivery process to support aarch64/arm64 on Linux.

Version 0.15.0 (July 16, 2024)

This release continues to add features that enhance your experience with synthetic data creation.

📊 Even more visualization options. You can now use the SDMetrics visualization features to explore just the real data — before any synthetic data creation even happens. This can allow you to identify important patterns in your real data ahead of time.

🧹 Prepare sequential data for modeling. Are you exploring a sequential dataset for synthetic data generation? We've now added utility functions to clean and simplify your data.

🔌 Enhanced database connectors. We're continuing to improve the import and export functionality from databases. We've fixed some bugs and added the abiliy to import from SQL views (virtual tables).

Version 0.14.1 (July 11, 2024)

In this patch release, we've fixed some bugs related to constraints and library dependencies. We've also added some utility functions for early-stage, alpha testing.

Version 0.14.0 (June 18, 2024)

This release expands the types of multi-table schemas that you're able to model, and continues to improve the quality of generated synthetic data.

🎉 Synthesize more types of multi-table schemas. SDV Enterprise now supports multi-table schemas that are not fully connected — for example, you may have a table or two that are separate from other tables. Use a multi-table synthesizer, such as HSASynthesizer, to model it all.

🌎 GPS anonymization improvements. We've made updates to the way we synthesize latitude/longitude coordinate pairs. You'll now see more accurate GPS coordinates based on your provided noise (in kilometers). For more information, see the GPSNoiser.

🐞 Bug fixes and validation. We've additionally fixed some bugs that resulted confusing print-outs when SDV Enterprise encountered an error or warning.

Version 0.13.1 (June 5, 2024)

In this release, we're continuing to add new features to directly connect your database. Use them to import real data and export synthetic data.

🌟 Connect to your Microsoft SQL Server! You can now directly import/export data from your Microsoft SQL Server, which may be hosted on AWS or your own servers. More connectors are coming soon. For all options, see the docs.

🗂️ Import multi-table data with referential integrity. When importing tables from a database, you can now pass in metadata. As a result, we'll ensure your imported data is fully valid with referential integrity across all your tables.

✅ Select only the database connectors you need. To keep the SDV Enterprise package to a manageable size, we've now made the database connectors optional. You can specify which ones you'd like to download during installation.

These features are still in Beta, so please be sure to try them out and provide feedback!

Version 0.13.0 (May 21, 2024)

This month's release allows for a better experience getting started with SDV and creating more realistic data. We've also prioritized bug fixes that affected our SDV Enterprise customers.

↔️ Subset multi-table data while maintaining connections. If your multi-table dataset is too large, you can now use utility functions to sample a smaller subset for use with SDV. This feature ensures that the subsetting will maintain referential integrity -- aka valid connections between your tables.

📂 [Beta!] Read & write from Excel sheets. If your data is available in a local, Excel file, you can now import it directly, and later export your synthetic data back into an Excel spreadsheet using the ExcelHandler.

🔀 Random ID generation. SDV Enterprise users now have access to a premium feature to completely randomize ID generation for primary or foreign keys. This makes your synthetic data look even more realistic than before.

Additional updates

Importing and exporting CSV data is also now more streamlined with the new CSVHandler
You can now use SDV with data where your column names are integers (0, 1, 2, ...)
We've fixed issues in the HSASynthesizer that caused it to print out many warnings during sampling, or a crash if you had too few rows.

Version 0.12.1 (Apr 19, 2024)

In this patch release, we've added some new features in Beta for your testing. Try them out and let us know what you think!

🤝 [Beta!] Database connectors. SDV Enterprise users can now directly make a connection to their database to import real data into the SDV, and later export synthetic data back out to a new database. We're currently testing the Google's BigQuery database, with more options coming soon!

🔃 Scrambling IDs. By popular demand, we now scramble any structured primary and foreign key IDs instead of generating them sequentially. This will help your data look more realistic. For more information, see the RegexGenerator.

Version 0.12.0 (Apr 16, 2024)

This release continues to provide you with the latest and greatest of synthetic data -- from Python compatibility, to data creation, to export.

🌟 Python 3.12 is here! You can now use SDV Enterprise (and any of the related SDV libraries) with Python 3.12. Upgrading your Python version to 3.12 will get you the latest security fixes and features from Python.

Our policy is to support the active Python versions to the best of our ability. Active versions are pre-determined by the Python organization. For more information, see the official status of Python versions.

📊 Set the min/max cardinality. If you are creating multi-table data from scratch using the DayZSynthesizer, you can now specify the min/max number of children each parent is allowed to have. Use this to get realistic data for logical relationships such as 1-to-1 or 1-to-many.

💾 Save your synthetic data as CSVs. You can now export multi-table synthetic data to local CSV files using a single command. More import and export integrations are coming soon!

Additional updates

We've enhanced many of our synthesizers and customizations to proactively alert you of potential issues. For example, when loading pre-trained synthesizer into a different SDV version, adding unknown columns in a relationship, or specifying a datetime column without a format.
We've fixed a number of bugs in the HMASynthesizer that led to missing values, and a diagnostic score under 1.0
We've deprecated the SingleTablePreset (FAST ML) in favor of the GaussianCoupla. The GaussianCopula synthesizer is just as fast while offering higher a higher statistical quality.

Version 0.11.0 (March 26, 2024)

This release covers a range of features that allow you to more easily get started with SDV and customize it for your needs.

🧹 Clean your data to create referential integrity. If your real multi-table data contains missing or unknown references, you can now use SDV's utility functions to clean it up. SDV expects and guarantees referential integrity in the data -- real and synthetic.

📦 Update metadata in bulk. High quality metadata makes for high quality synthetic data ... but what if it's taking too long to update your metadata? Use our new bulk update features to make changes faster, and get your synthetic data sooner.

🎭 Even more anonymization options. Control the amount of anonymized PII data you want to create in synthetic data, and whether it should repeat. Supply a cardinality rule to let SDV know whether to fake unique values or repeated anonymized data.

Additional updates

We've improved our error messaging around invalid foreign keys, column relationships, and constraints to better help you understand the issues and debug them.
We've consolidated the way you can retrieve parameters from your synthesizers.

Version 0.10.0 (February 20, 2024)

This release includes some highly requested new features, out for your experimentation and beta testing. Please let us know if you have any feedback!

🌎 Accurately model GPS coordinates. Our newest updates allows you to more accurately synthesize data for latitude/longitude pairs. To get started, add a column relationship to your metadata to identify the pair of columns. SDV synthesizers will handle the rest! For more information, see the GPS transformers.

🔎 [In Beta!] Create metadata from a DDL file. Is your original data in a SQL database? We're beta testing a feature to automatically convert your DDL file to SDV metadata. Currently, we're supporting the IBM DB2 database, with more coming soon.

Additional updates

We've improved the metadata auto-detection to carefully parse column names — and more accurately determine if something is PII.
We've fixed some bugs you may encounter when combining constraints with other types of sampling features.

Version 0.9.0 (January 16, 2024)

This release includes some quality improves, bug fixes and sets up some foundations for future optimizations.

⚡ Improved Data Cardinality. With our new preprocessing methods, you'll now see improved data quality for parent/child cardinality in multi-table datasets. For example, if 40% of the real parents rows have 1 child while 60% have 2+ children, then the same will be true in the synthetic data.

💕 Specify Column Relationships. Do you have multiple columns that encode the same concept? Now, you can specify these column relationships directly in the metadata. In this release, you can annotate when multiple columns together represent a single physical mailing address. More concepts are coming soon!

🧱 Foundational Codebase Changes. Behind-the-scenes, we've been making a few changes to the way our codebase is compiled. There are no changes to the Python API — this is just to help us make optimizations in the future. Please feel free to retry our existing features and let us know if you notice any issues!

Additional updates

You can now opt to create random, anonymized data from any of the options in the Faker library. Even complex concepts such as currency.
The Inequality constraint now works correctly for datetime columns, especially with complex date formats.
The CTGANSynthesizer now works with the FixedCombinations constraint.

Version 0.8.0 (December 19, 2023)

In this release, we're continuing to make improvements for metadata auto-detection and synthetic data evaluation.

🎭 [Beta!] Auto-Detect PII Columns. As part of our continued improvements for Metadata Auto-Detection, we've started to detect some basic, PII concepts. If your column names contain certain keywords such as email or phone_number, we'll match them up to the correct PII types.

📐 Improved Data Diagnostic. The all-new and improved diagnostic is more targeted at diagnosing issues with your synthetic data. Based on common application requirements, we've incorporated validity checks that are important to you: Referential integrity, min/max values, and more. For more information, see the Diagnostic Report API.

Additional updates

The DayZSynthesizer can now create missing values within numerical and categorical columns
The DayZSynthesizer can now create discrete, category values from scratch for any type of data (not just strings)

Version 0.7.0 (November 21, 2023)

This release provides features that make it easier for you to create highly realistic data and evaluate the synthetic data.

🌎 [Beta!] Generate realistic addresses worldwide. You can now identify a group of columns that, together, identify a physical/mailing address. The SDV ensures that the address data is consistent throughout the columns, pointing to realistic locations. For more information, see the address API.

📈 Compute intertable trends. For multi-table datasets, the quality reports includes correlations between different tables, providing more insight into the quality of your primary/foreign key connections. For more information, see the Quality Report API.

Additional updates

When plotting your data, you can choose from a variety of visualization options: displots, bar graphs, scatterplots, heatmaps and more.
We fixed a bug that caused incorrect datetimes (by +/- 1 day) in rare cases

Version 0.6.0 (October 17, 2023)

In this release, we've made it even easier for you to get started. New features allow the SDV to auto-detect important elements from you real data when creating your metadata file.

🔑 Detect primary and foreign keys. With the new updates, the SDV looks for primary keys in your tables, as well as potential foreign keys that may connect two tables. This will help you discover the overall structure of your schema.

📆 Detect datetime formats. The SDV can now infer datetime values from you columns and determine the format string.

📊 Discover categorical columns. Do you have discrete categories that are encoded using numbers? The SDV will detect these columns to improve your synthetic data quality.

❓Calling attention to unknown columns. The metadata detection is not perfect. The SDV will call out unknown columns for you to verify. By default, we'll treat unknown columns as PII, ensuring that you do not accidentally leak any PII information.

An accurate metadata file is the key to high quality synthetic data. To see these features in action, check out the Metadata Creation Demo.

Additional Updates

Enterprise users can anonymize their metadata to obfuscate column and table names. This makes it easier to share your metadata if this information is sensitive.
You can now visualize the cardinality of a parent-child relationship. This allows you to see if the synthetic data is correctly capturing the # of children that each parent rows has.

Version 0.5.0 (September 19, 2023)

This release expands the synthesizer options available to you, and improves your experience when evaluating your synthetic data.

💯 Comprehensive, robust reporting. Use the Quality Report to get insights into how your synthetic data compares to the real data. You'll now see an updated progress bar, clearer error messages and support for a wider variety of schemas.

⭐ Additional support for schemas. The IndepedentSynthesizer now supports more complex multi-table schemas, such as multiple connections between tables.

Additional Updates

If you try to evaluate outlier coverage on data without any outliers, you'll now get a friendly error message

Version 0.4.1 (August 29, 2023)

This smaller patch releases allows additional custom constraint functionality that may impact your projects.

🎭 You can now apply custom constraints to PII columns (such as addresses, phone numbers, etc.) or even to a combination of PII with other column types.

🔑 You can now use the IDGenerator to generate an index-based key, such as a primary key

Version 0.4.0 (August 15, 2023)

This release improves the quality of your synthetic data and provides you with additional options for modeling your datasets.

⭐ A new, faster multi-table synthesizer: The IndependentSynthesizer is our fastest synthesizer yet! Use it for modeling an unlimited number of tables in complex configurations. (For a full list of features and tradeoffs, see the SDV Synthesizer Guide.)

🌎 Contextual anonymization for phone numbers and emails. Understand the deeper meaning behind phone number and email data to anonymize the PII in a hyper-realistic way. For eg., match general geographical regions while obfuscating the precise, sensitive information.

📊 Evaluate outliers in your synthetic data. The previous release allowed you to model rare events. In this release, you can apply metrics to quantify the results and guard against common failure modes. See metrics for OutlierCoverage and SmoothnessSimilarity.

Additional Updates

Support for Python 3.11. You can now use the SDV Enterprise with any of the currently active versions of Python (3.8-3.11).
Create and apply custom constraints for ID columns (such as primary keys) as well as PII columns (such as phone numbers).
Improved performance and progress tracking during quality evaluation. This release fixed bugs that led to incorrect scores, crashes and repeated warning messages.

Version 0.3.0 (July 18, 2023)

In this release, we're adding features that improve synthetic data quality along with bug fixes and security recommendations.

👉 Improve synthetic data cardinality. The HSASynthesizer will now create synthetic data that conforms to the real multi-table patterns. For example, you can synthesize data with an exact 1-1 relationship between entities. Another common case is when a parent entity must have at least 1 child.

🌠 Model and recreate rare events. You now have more options to synthesize rare events that closely resemble the real data. Use the UniformEncoder to capture imbalanced categories and the OutlierEncoder to identify outlier data points.

🎭 Anonymize PII more realistically. In your real data, some records may have empty or missing PII attributes. You synthetic data will now include missing values in the correct proportions, leading to more realistic anonymization.

Additional Updates

Dropping support for Python 3.7. This version has officially reached its end-of-life and the Python organization will no longer update its security. For your safety, we recommend upgrading Python to 3.8 or above.
Bug fixes for datetime columns. We fixed issues that you may have encountered if you stored datetime information as integers.

Version 0.2.0 (June 20, 2023)

In this release, we're continuing to add new, enterprise features and prioritize bug fixes that affected you the most.

⛓️ Introducing the Chained Inequality constraint. Use the new Chained Inequality constraint when multiple columns follow an order. For example, purchase_date < start_date < end_date < expiration_date. This constraint is only available to licensed users. (We've also fixed some bugs in the publicly available Inequality and Range constraints.)

🕑 Generate times from scratch. You can now use the DayZSynthesizer to generate times without any attached dates. (For example, you can generate 12:06 PM without a specific day.)

🐞 Bug fixes in the Diagnostic Report. Some of you reported a ValueError when running the Diagnostic report with your data. We've fixed the related bugs in the Synthesis property.

Additional Updates

For security and performance, we strive to keep up with the latest software releases. We've now upgraded the SDV to using the latest data science libraries such as pandas 2.0 and torch 2.0. For more information, see What's in the package?.

Version 0.1.0 (May 16, 2023)

This release marks a major milestone the overall API usage and synthetic data workflows. New features include:

✅ Easy, automatic metadata creation and validation. Use the new Python API to write a metadata JSON description. Automatically detect it from real data and validate it to make sure it's accurate.

🔍 Pause, inspect and customize your synthetic data workflow. Fine-tune your synthesizer to optimize data quality. Customize and view the data preprocessing to make your synthetic data project successful.

✨ Generate synthetic data from scratch. Are you still waiting for access to the real data? No problem. Use the Day Z Synthesizer to supply your metadata and create realistic synthetic data on day zero — no real data or machine learning required!

Additional Updates

We've also addressed a number of smaller issues including: The ability to load in multiple CSVs, modeling columns that are completely blank (null) and generating IDs with large regexes.

Version 0.0.5 (Apr 18, 2023)

This releases improves performance and memory consumption in the HSA synthesizer

Version 0.0.4 (March 21, 2023)

This release includes the new, proprietary HSA synthesizer, a multi-table model with improved performance that is available only to SDV Enterprise customers. It also includes a bug fix to improve the performance of sensitive data (PII).

Version 0.0.3 (Feb 21, 2023)

This release adds support for Python 3.10 across all the supported platforms.

Version 0.0.2 (Jan 17, 2023)

In this release, we are continuing to test our SDV Enterprise software on a variety of enterprise IT requirements to ensure that it can be installed correctly.

Version 0.0.1 (Dec 20, 2022)

This is our first full SDV Enterprise release! In this release, our goal is to support easy, offline installation for a variety of enterprise IT requirements. Included features:

Package support for Linux and Windows, with fully offline installation
Synthetic data creation and evaluation features for demo datasets that are available offline
A dedicated SDV Enterprise site (this one!)

PreviousIT Security NextWhen is the next release?

Last updated 6 days ago