# Databricks (Single Workspace)

## Overview

The Veza Databricks integration provides visibility into access and permissions for your Databricks machine learning platform. Veza integrates directly with workspace-level Databricks configurations to:

* Discover access relationships between users, groups, service principals, and resources
* Visualize access across catalogs, clusters, notebooks, schemas, and more with Veza's Access Graph
* Identify excessive group assignments, admin overreach, and service principal sprawl

This integration connects to individual Databricks workspaces using personal access token (PAT) authentication. Veza discovers entity and authorization metadata using the native Databricks REST API, including workspace resources, the `hive_metastore` catalog, schemas, tables, and permissions.

For organizations that use single sign-on (SSO) for federated access to Databricks, the integration discovers authorization and effective permissions that Azure AD, Okta, and AWS Identity Center identities have on Databricks resources.

**When to use this integration:** Use this integration for standalone Databricks workspaces or legacy deployments without Unity Catalog. For organizations using Unity Catalog to govern access to Databricks across multiple workspaces on Microsoft Azure, AWS, or Google Cloud, see [Databricks Unity Catalog](/4yItIzMvkpAvMVFAamTf/integrations/integrations/databricks-unity-catalog.md) for account-level discovery configured through cloud provider integrations.

For details on supported entities and attributes, see [Notes and Supported Entities](#notes-and-supported-entities).

#### Requirements

To connect to [Azure Databricks](https://azure.microsoft.com/en-us/products/databricks/), the workspace must enable both **Workspace Access Control** and **Cluster, Pool, and Jobs Access Control**. These features require a Premium Azure Databricks plan. To enable these settings:

1. As a Databricks administrator, click your username in the top bar of the Azure Databricks workspace and click **Admin Settings**.
2. Open **Workspace Settings**.
3. Toggle **Workspace Access Control** and **Cluster, Pool and Jobs Access Control**.
4. Click **Confirm**.

See [Enable Access Control](https://learn.microsoft.com/en-us/azure/databricks/security/auth-authz/access-control/enable-access-control) for details.

#### Authentication

The integration requires a Databricks user with account admin rights (required to list all workspace entities and permissions). Using a non-admin user token will result in an incomplete discovery.

1. [Create a new Databricks Admin user](https://docs.databricks.com/administration-guide/users-groups/users.html#add-users-to-your-databricks-account) Veza can connect as.
   1. As an account admin, log in to the account console.
   2. Click *Account Console > User management*.
   3. On the **Users** tab, click *Add User*.
   4. Provide a name and email address for the user.
   5. Click *Send invite*.
2. Generate a [personal access token](https://docs.databricks.com/dev-tools/api/latest/authentication.html#generate-a-personal-access-token).
   1. Click *Settings* in the lower left corner of your Databricks workspace.
   2. Click *User Settings*.
   3. Go to the **Access Tokens** tab.
   4. Click the *Generate New Token* button. Optionally enter a description (comment) and expiration period.
   5. Click *Generate*. Copy the generated token, and store it securely.
3. [Assign the admin role](https://docs.databricks.com/administration-guide/users-groups/users.html#account-admin) to the user.
   1. As an account admin, log in to the account console.
   2. Click *Account Console > User management*.
   3. Find and click the user you created.
   4. On the **Roles** tab, turn on *Account admin*.

See [Authentication using Databricks personal access tokens](https://docs.databricks.com/dev-tools/api/latest/authentication.html) for more information.

{% hint style="info" %}
**Note**: The account admin role ensures complete discovery of workspace resources. Personal Access Token discovery requires workspace admin permissions and may not be available on all Databricks tiers. If unavailable, the integration continues without PAT metadata.
{% endhint %}

#### Creating a cluster

To extract metadata for the Databricks storage layer, Veza needs to run SQL queries on one of the clusters in the workspace. You should create a separate cluster for this purpose. The cluster will be automatically started only when Veza is conducting extractions, and automatically stopped after a set amount of inactivity.

To create a cluster from the Databricks UI, pick **Create** > *Cluster*:

* The cluster can be a small single-node cluster
* You should enable termination after an inactivity period (\~10 minutes)
* Add `spark.databricks.acl.sqlOnly true` to *Advanced Options > Spark > Spark config*
* Ensure the user created for the Veza integration has `CAN_MANAGE` permission on the cluster (*More > Permissions*)

Once the cluster is running, copy the cluster's HTTP endpoint from *Advanced Options > JDBC/ODBC > HTTP path*.

For more details on creating Databricks clusters see [Compute configuration reference](https://docs.databricks.com/clusters/create-cluster.html).

{% hint style="info" %}
**Note:** Compute cluster endpoints are appropriate for workspace mode (legacy deployments without Unity Catalog). For Unity Catalog deployments, use a SQL Warehouse endpoint instead. See [Databricks Unity Catalog](/4yItIzMvkpAvMVFAamTf/integrations/integrations/databricks-unity-catalog.md#configure-a-sql-warehouse-for-metadata-extraction) for details.
{% endhint %}

#### Veza Configuration

From the Veza **Configuration** panel, navigate to the *Apps & Data Sources* tab. Scroll down to the *Standalone Databases* section and click *Add New*. Choose "Databricks" and provide the required information:

| Field              | Details                                                                               |
| ------------------ | ------------------------------------------------------------------------------------- |
| `Name`             | Display name for the integration                                                      |
| `Workspace URL`    | Web address of the Databricks workspace (without the `https://`)                      |
| `Access Token`     | Databricks user personal token                                                        |
| `Cluster Endpoint` | JDBC/ODBC endpoint for the [cluster](#creating-a-cluster) configured for Veza use     |
| `SSO Type`         | The Identity Provider used for Single Sign On (optional).                             |
| `SSO ID`           | Data Source ID of the identity provider used for [single sign-on](#identity-mappings) |

> The Azure, AWS Identity Center, or Okta Identity Provider used for SSO must be integrated with Veza as a data source.

#### Identity Mappings

To enable email-based mapping between Identity Provider identities and Databricks users:

1. Use Access Graph to search for the Azure AD Domain, Okta Domain, or AWS Identity Center service. Open the *Entity Details* and copy the `Datasource ID`.
2. Open *Data Catalog* > *Apps and Data Sources* and find your Databricks provider under *Standalone Databases*. Click *Edit*. If you haven't configured the provider yet, click *Add New* and choose *Databricks*.
3. Select your SSO provider as the *SSO type*. For *SSO ID*, use the `Datasource ID` of the Azure AD, AWS Identity Center, or Okta IdP.

| Provider            | Datasource ID                                                                  |
| ------------------- | ------------------------------------------------------------------------------ |
| Azure AD            | Azure Tenant ID (`ff57cf71-ac1c-43b8-8111-43b1be101dab`)                       |
| Okta                | Okta Domain (`<domain>.okta.com`)                                              |
| AWS Identity Center | AWS Identity Center identity store ID (`ff57cf71-ac1c-43b8-8111-43b1be101dab`) |

For other identity providers, or for more complex mapping rules, use [Custom Identity Mappings](/4yItIzMvkpAvMVFAamTf/integrations/configuration/custom-identity-mappings.md).

## Notes and Supported Entities

Every Databricks workspace has a central Hive metastore accessible by all clusters to store table metadata. This metastore (`hive_metastore`) is the sole catalog entity discovered by this integration, along with schemas, tables, and permissions on those entities. Hive metastores do not support assigning permissions to catalogs.

For Databricks entity types, attributes, permissions, and effective access calculations, see the [Databricks Entities and Permissions Reference](/4yItIzMvkpAvMVFAamTf/integrations/integrations/databricks-unity-catalog/databricks-info.md).

### Workspace Mode

* **Identity Entities**: Users, Service Principals, Groups, and Personal Access Tokens at the workspace level
* **Data Entities**: Workspace, `hive_metastore` Catalog, Schemas, Tables, and Views
* **Workspace Objects**: Directories and Notebooks with permission inheritance
* **Compute Resources**: Clusters, Jobs, and Pipelines with access controls

**Important:** Permissions on data tables are only enforced when both workspace settings enable table ACLs **and** the cluster supports table ACLs (High Concurrency clusters only). If table ACLs are not enabled for a cluster, all users with cluster access can query all tables accessible from that cluster.

## Effective Permissions

From Access Graph, select an **EP** node and click *Explain Effective Permissions* to view the raw Databricks ACLs that result in a set of effective permissions. Effective Permissions can account for the following scenarios:

* In Databricks, Access Control Lists (ACLs) regulate an identity's permissions to access data tables, clusters, pools, jobs, as well as workspace objects such as notebooks, experiments, and folders. By default those controls are turned off: all users are allowed to do anything.
* Permissions can be inherited from the parent entity.
* Permissions on data tables are only available when enabled both in workspace settings and the cluster (available only on High Concurrency clusters).
* If permissions on data tables aren't enabled for a cluster, then all users that have permissions on that cluster can also access all tables.


---

# Agent Instructions: Querying This Documentation

If you need additional information that is not directly available in this page, you can query the documentation dynamically by asking a question.

Perform an HTTP GET request on the current page URL with the `ask` query parameter:

```
GET https://docs.veza.com/4yItIzMvkpAvMVFAamTf/integrations/integrations/databricks-single-workspace.md?ask=<question>
```

The question should be specific, self-contained, and written in natural language.
The response will contain a direct answer to the question and relevant excerpts and sources from the documentation.

Use this mechanism when the answer is not explicitly present in the current page, you need clarification or additional context, or you want to retrieve related documentation sections.
