❖ SelfReferentialHierarchy
Last updated
Last updated
Use the SelfReferentialHierarchy constraint when you have a column in the table that references the primary key column of the same table (aka a self-reference); and the self-references are not allowed to have any cycles.
This functionality is in Beta. At this time, select SDV Enterprise users have been invited to use this feature.
Create a SelfReferentialHierarchy
constraint.
Parameters:
(required) table_name
: A string with the name of the table that contains the self-reference
(required) primary_key
: A string with the name of the primary key column in the table
(required) foreign_key
: A string with the name of the foreign key column in the same table that references the primary key
scaling_method
: A string with the name of the method, used when scaling up the synthetic data
(default) 'branch'
: Keep the original depth of the hierarchy but add more branches to it. In our example above, this would add more reports for a given manager.
'depth'
: Add to the depth of the self-references. In our example above, this would create new levels of managers, increasing the length of the reporting chain to the CEO.
'multiply'
: Keep the original branching factor and depth to the hierarchy, but create more trees. In our example above, this would create additional companies with new CEOs and reporting chains.
from sdv.cag import SelfReferentialHierarchy
my_constraint = SelfReferentialHierarchy(
table_name='Employees',
primary_key='Employee ID',
foreign_key='Manager ID')
Make sure that all the table and columns in you provide are in your Metadata, and have a primary key associated with them. Note that you cannot supply a self-reference relationship in the metadata right now, so the relationships
section of your Metadata can be blank.
Apply the constraint to any SDV synthesizer. Then fit and sample as usual.
synthesizer = HSASynthesizer(metadata)
synthesizer.add_constraints([my_constraint])
synthesizer.fit(data)
synthetic_data = synthesizer.sample()
A self-referential hierarchy can be visualized as a tree-like dependency structure. For example, each row of the data (an employee) can be visualized as a node in an overall tree, pointing to the manager. The topmost mode represents the CEO, followed by mangers, employees, etc. In this way, it's possible to see the overall branch factor and depth of the tree too.
Create a graphic that corresponds to your data using the visualize
function.
Parameters:
(required) data
: A pd.DataFrame object containing the data you want to visualize. The data should match the constraint
show_primary_keys
: Toggle whether the primary key IDs should be displayed in the visualization
(default) True
: Show the primary key ID for each row
False
: Do not show the primary key. The visualization will include blank circles representing the nodes.
max_trees
: If there are multiple, separate trees to your hierarchy, use this parameter to control the max number of individual trees you'd like to visualize
(default) None
: Visualize all the data
<integer>
: Only visualize the given # of trees. the remaining rows will not be visualized.
max_depth
: If the trees have a very high depth, use this parameter to control the maximum depth you'd like to visualize
(default) None
: Visualize all the data regardless of the depth
<integers>
: Only visualize up to the given depth. Any nodes deeper than this level will not be visualized.
output_filepath
: If provided, save the image at the given location in the given format.
(default) None
: Do not save the visualization
<filepath>
: A string with the name of the filepath. This must end with the filetype that you want to save as. Popular examples are png
, jpg
or pdf
.
Output A graphviz.graphs.Digraph containing the visualization
from sdv.cag import SelfReferentialHierarchy
my_constraint = SelfReferentialHierarchy(
table_name='Employees',
primary_key='Employee ID',
foreign_key='Manager ID')
<graph> = my_constraint.visualize(
my_dataframe,
show_primary_keys=True,
max_trees=2,
max_depth=3,
output_filepath='visualizations/my_graph.png'
)
For more information about using predefined constraints, please see the Constraint-Augmented Generation tutorial.