# Databricks (Unity Catalog)

## Overview

The Veza Databricks integration with Unity Catalog discovers access and permissions across your Databricks deployment. Veza integrates directly with Unity Catalog-enabled and workspace-level Databricks configurations to:

* Discover access relationships between users, groups, service principals, and resources
* Visualize access across catalogs, clusters, notebooks, schemas, and more with Veza's Access Graph
* Identify excessive group assignments, admin overreach, and service principal sprawl

For organizations using Unity Catalog to govern access to Databricks, Databricks discovery is enabled in your cloud provider integration configuration for [AWS](/4yItIzMvkpAvMVFAamTf/integrations/integrations/aws.md), [GCP](/4yItIzMvkpAvMVFAamTf/integrations/integrations/google.md), or [Microsoft Azure](/4yItIzMvkpAvMVFAamTf/integrations/integrations/azure.md). Veza connects to your Databricks account to discover authorization metadata for all workspaces and resources the service principal can access, including:

* Account-level users, service principals, and groups
* Unity Catalog metastores and catalogs shared with workspaces
* All workspace resources (clusters, notebooks, directories, jobs, pipelines)
* OAuth applications and federation policies
* Service principal secrets and OIDC configurations

Use Unity Catalog integration for Databricks deployments with centralized governance across multiple workspaces. For standalone Databricks workspaces without Unity Catalog, see [Databricks (Workspace Mode)](/4yItIzMvkpAvMVFAamTf/integrations/integrations/databricks-single-workspace.md).

For details on supported entities and attributes, see [Databricks Entities and Permissions Reference](/4yItIzMvkpAvMVFAamTf/integrations/integrations/databricks-unity-catalog/databricks-info.md).

#### Requirements

To enable the integration, you will need:

* A Databricks account on the Premium plan, with SSO (Unified Login) enabled.
* **Microsoft Azure, Google Cloud**: The Veza service principal for the cloud provider integration must be assigned to access the account via SSO as an account admin.
* **AWS**: Veza uses OAuth credentials for a Databricks service account with the Account admin role
* The Veza service principal must be assigned as an admin on all workspaces to fully discover all sub-resources.
* [Single Sign-On (Unified Login)](https://docs.databricks.com/en/administration-guide/users-groups/single-sign-on/index.html) enabled for the workspaces to discover. Unified Login is always enabled on Google Cloud and Microsoft Azure, but is optional for AWS deployments.
* A dedicated cluster for running extraction queries (see below).
* Administrator access to Databricks to create a Veza service principal and cluster.

{% hint style="info" %}
**Permission Requirements for Full Discovery:**

* **Personal Access Tokens**: Requires workspace admin role. May not be available on all Databricks pricing tiers. If unavailable, extraction continues without PAT discovery.
* **Jobs, Pipelines, and Permissions**: Workspace admin enables complete discovery of all jobs, pipelines, and their permissions across the workspace.
* **ACLs**: Premium tier or higher required for full access control list discovery.
  {% endhint %}

#### Configure a Veza service principal

The integration requires a Databricks service principal with account admin privileges (required to list all workspace entities and permissions):

* **Databricks on AWS**: OAuth2 token for a Databricks service account (M2M access).
* **Databricks on Google Cloud Platform**: Google Service Account configured for the [Google Cloud Platform](/4yItIzMvkpAvMVFAamTf/integrations/integrations/google.md) integration.
* **Databricks on Microsoft Azure**: Azure App Integration configured for [Azure](/4yItIzMvkpAvMVFAamTf/integrations/integrations/azure.md) discovery.

**OAuth M2M for AWS**

To create a service principal for Databricks on AWS, log in to the Databricks account console as an administrator:

1. Go to **User management**.
2. Under **Service principals**, click **Add service principal**.
3. Enter a name and click **Add**.
4. On the **Roles** tab, enable *Account admin* to enable account-level API calls.

Assign your service principal to identity federated workspaces.

1. Open **Workspaces** and click your workspace name.
2. Go to **Permissions** > **Add permissions**.
3. Search for the user, Assign the `Admin` permission level and save the changes. Get an OAuth client secret:

To create an OAuth secret for a service principal using the account console:

1. In the Databricks account console, open **User management**.
2. On the **Service principals** tab, find the service principal.
3. Under **OAuth secrets**, click **Generate secret**.
4. Copy the Secret and Client ID, and then click **Done**.

For more details see [OAuth M2M](https://docs.databricks.com/en/dev-tools/auth/oauth-m2m.html) in the Databricks documentation.

#### Configure a SQL Warehouse for metadata extraction

Veza runs SQL queries to collect Unity Catalog metadata. For Unity Catalog deployments, you must use a SQL Warehouse (preferably serverless) instead of a compute cluster.

{% hint style="warning" %}
**Important:** Serverless SQL Warehouses are required to access certain Unity Catalog features, including managed catalogs and Online Tables. Using a compute cluster endpoint will result in `PERMISSION_DENIED` errors when Veza attempts to query these resources.
{% endhint %}

**To configure a SQL Warehouse:**

1. In the Databricks workspace, go to **SQL Warehouses** (under SQL section in the sidebar)
2. Select an existing serverless SQL warehouse, or create a new one:
   * Click **Create SQL warehouse**
   * Choose **Serverless** as the warehouse type
   * Configure auto-stop after inactivity (\~10 minutes)
3. Open the warehouse and go to **Connection details**
4. Copy the **HTTP path** (format: `/sql/1.0/warehouses/[warehouse-id]`)
5. Ensure the Veza service principal has appropriate permissions on the warehouse

**Identifying the endpoint type:**

| Endpoint Type   | Format                                    | Use Case                                                                 |
| --------------- | ----------------------------------------- | ------------------------------------------------------------------------ |
| SQL Warehouse   | `/sql/1.0/warehouses/[warehouse-id]`      | Unity Catalog deployments (required for managed catalogs, Online Tables) |
| Compute Cluster | `sql/protocolv1/o/[numbers]/[cluster-id]` | Legacy workspace mode only                                               |

For more details, see [Connect to a SQL warehouse](https://docs.databricks.com/aws/en/compute/sql-warehouse/) in the Databricks documentation.

#### Assign the Veza user to Databricks workspaces

The Veza service principal must be a workspace-level administrator to discover Workspaces subresources such as notebooks and clusters. Without admin permissions, the integration will not be able to gather metadata for the workspace.

To add the Veza service principal to a workspace with the admin role:

**Using the Databricks account admin console** (for workspaces with identity federation):

1. Open **Workspaces** and click your workspace name.
2. Go to **Permissions** > **Add permissions**.
3. Search for the user, Assign the `Admin` permission level and save the changes.

**Using the Workspace admin console**:

1. Click your username in the top bar of the Databricks workspace and select **Admin Settings**.
2. Open **Identity and access**.
3. Go to **Users** > **Manage** > **Add User**.
4. Click **Add new** to create a new user and enter the email of the Veza service account.
5. Click **Add**.
6. On the list of users, click the user.
7. Click the **Entitlements** tab.
8. Click the toggle next to **Admin access**.

See [Manage Users](https://docs.databricks.com/en/administration-guide/users-groups/users.html) for more detail.

#### Enable Databricks extraction

Databricks extraction is disabled by default. To enable the service, edit the AWS, Google Cloud Platform, or Microsoft Azure integration:

1. Go to the Veza **Integrations** page.
2. Find the integration on the list and click **Edit**.
3. In the third section **Limit Services**, tick **Limit {Integration} Services**
4. Click **Select All** to enable all services, or tick the boxes for services your company uses. Tick the box next to **Databricks**.
5. Go to the **Details** section.
6. Enter the additional fields:
   * *Databricks account ID*: Databricks [account id](https://docs.databricks.com/en/administration-guide/account-settings/index.html)
   * *Databricks collector cluster tag*: Cluster tag for running queries. If empty, Veza will use the first available cluster.
   * *AWS: Databricks OAuth M2M client ID*: Client ID for OAuth M2M authentication
   * *AWS: Databricks OAuth M2M client secret*: Veza service principal client secret
7. Click **Save Integration** to enable the connection.

## Notes and Supported Entities

Veza discovers Unity Catalog entities at two levels:

**Account-level entities:**

* Account
* Account Users, Service Principals, and Groups (with role assignments)
* Catalogs (Unity Catalog-managed)
* Metastores
* Service Principal Secrets
* Federation Policies (account-level and service principal-specific)
* OAuth App Integrations (published and custom)

**Workspace-level entities** (all entities from workspace mode):

* Users, Service Principals, Groups, and Personal Access Tokens
* Schemas, Tables, and Views
* Directories and Notebooks
* Clusters, Jobs, and Pipelines

For more information about each entity type see [Databricks Entities and Permissions Reference](/4yItIzMvkpAvMVFAamTf/integrations/integrations/databricks-unity-catalog/databricks-info.md).


---

# Agent Instructions: Querying This Documentation

If you need additional information that is not directly available in this page, you can query the documentation dynamically by asking a question.

Perform an HTTP GET request on the current page URL with the `ask` query parameter:

```
GET https://docs.veza.com/4yItIzMvkpAvMVFAamTf/integrations/integrations/databricks-unity-catalog.md?ask=<question>
```

The question should be specific, self-contained, and written in natural language.
The response will contain a direct answer to the question and relevant excerpts and sources from the documentation.

Use this mechanism when the answer is not explicitly present in the current page, you need clarification or additional context, or you want to retrieve related documentation sections.
