> For the complete documentation index, see [llms.txt](https://docs.sdv.dev/sdv/llms.txt). Markdown versions of documentation pages are available by appending `.md` to page URLs; this page is available as [Markdown](https://docs.sdv.dev/sdv/~/changes/T3ZD1DOoRUEqkmrAGBZp/multi-table-data/data-preparation.md).

# Data Preparation

Multi table data is present in multiple tables that each have rows and columns. The tables are connected to each other through foreign and primary key references.

<figure><img src="/files/m1Iuq2Jqjk4o6YoMDk3T" alt=""><figcaption><p>This example of a multi table dataset has a table for hotels and a table for their guests. Each guest can have multiple stays at multiple hotels.</p></figcaption></figure>

Before you begin creating synthetic data, it's important to have your data ready in the right format:

1. **Data**, a dictionary that maps every table name to a pandas DataFrame object containing the actual data
2. **Metadata**, a MultiTableMetadata object that describes your table. It includes the data types in each column, keys and the connections between tables.

<details>

<summary>Click to see the metadata</summary>

```python
{
    "METADATA_SPEC_VERSION": "MULTI_TABLE_V1",
    "tables": {
        "guests": {
          "primary_key": "guest_email",
          "alternate_keys": ["credit_card_number"],
          "columns": {
            "guest_email": { "sdtype": "email", "pii": True },
            "hotel_id": { "sdtype": "id", "regex_format": "HID_[0-9]{3}" },
            "has_rewards": { "sdtype": "boolean" },
            "room_type": { "sdtype": "categorical" },
            "amenities_fee": { "sdtype": "numerical" },
            "checkin_date": { "sdtype": "datetime", "datetime_format":  "%d %b %Y"},
            "checkout_date": { "sdtype": "datetime", "datetime_format": "%d %b %Y"},
            "room_rate": { "sdtype": "numerical" },
            "billing_address": { "sdtype": "address", "pii": True},
            "credit_card_number": { "sdtype": "credit_card_number", "pii": True}
          }
        },
        "hotels": {
            "primary_key": "hotel_id",
            "columns": {
                "hotel_id": { "sdtype": "id", "regex_format": "HID_[0-9]{3}" },
                "city": { "sdtype": "categorical" },
                "state": { "sdtype": "categorical" },
                "rating": { "sdtype": "numerical" },
                "classification": { "sdtype": "categorical" }
            }
        }
    },
    "relationships": [{
        "parent_table_name": "hotels",
        "parent_primary_key": "hotel_id",
        "child_table_name": "guests",
        "child_foreign_key": "hotel_id"
    }]
}
```

</details>

### Learn More

<table data-card-size="large" data-view="cards"><thead><tr><th></th><th></th><th data-hidden data-card-target data-type="content-ref"></th></tr></thead><tbody><tr><td><a href="/pages/hSfxZiScNZTAFoZpM6ph"><strong>Loading Data</strong></a></td><td>Get started with a demo dataset or load your own data.</td><td><a href="/pages/hSfxZiScNZTAFoZpM6ph">/pages/hSfxZiScNZTAFoZpM6ph</a></td></tr><tr><td><a href="/pages/70dzL3ZTDgj8HBQH5gTc"><strong>Creating Metadata</strong></a></td><td>Create an object to describe the different columns in your data. Save it for future use.</td><td><a href="/pages/70dzL3ZTDgj8HBQH5gTc">/pages/70dzL3ZTDgj8HBQH5gTc</a></td></tr></tbody></table>

## Multi Table Schemas

{% hint style="info" %}
**What kinds of multi table schemas are compatible with the SDV?** The SDV can be used to model many different types of multi table dataset schemas as long as they meet the criteria below.

1. **All the tables should be connected in some way.** If you have disjoint sets of tables, you can model each set separately.
2. **There should be no cyclical dependencies.** For eg, a table cannot refer to itself. Or if table A refers to table B, then table B cannot refer back to table A.
3. **There should be no missing references** (aka orphan rows). If a table A refers to table B, then every reference must be found. Note that it is ok if a parent row has no children.
   {% endhint %}


---

# Agent Instructions
This documentation is published with GitBook. GitBook is the documentation platform designed so that both humans and AI agents can read, navigate, and reason over technical content effectively. Learn more at gitbook.com.

## Querying This Documentation
If you need additional information that is not directly available in this page, you can query the documentation dynamically by asking a question.

Perform an HTTP GET request on the current page URL with the `ask` query parameter:

```
GET https://docs.sdv.dev/sdv/~/changes/T3ZD1DOoRUEqkmrAGBZp/multi-table-data/data-preparation.md?ask=<question>
```

The question should be specific, self-contained, and written in natural language.
The response will contain a direct answer to the question and relevant excerpts and sources from the documentation.

Use this mechanism when the answer is not explicitly present in the current page, you need clarification or additional context, or you want to retrieve related documentation sections.
