Databricks (Unity Catalog)

Configuring the Veza integration for Databricks with Unity Catalog enabled.

If your organization uses Unity Catalog to federate access to Databricks workspaces, you can enable Databricks integration when configuring an AWS, GCP, or Microsoft Azure provider. Veza connects to your Databricks account to discover authorization metadata for all workspaces and resources the service principal can access. Veza also discovers account-level users and groups in Unity Catalog, and account-level Metastores shared with workspaces. Supported entities include:

Account-level:

  • Databricks Account

  • Databricks Account User

  • Databricks Account Service Principal

  • Databricks Account Service Group

  • Databricks Metastore

Workspace-level:

  • Databricks Catalog

  • Databricks Cluster

  • Databricks Notebook

  • Databricks Directory

  • Databricks Schema

  • Databricks Table

  • Databricks User

  • Databricks Group

For discovering single workspaces without Unity Catalog enabled, see Databricks.

Requirements

To enable the integration, you will need:

  • A Databricks account on the Premium plan, with SSO (Unified Login) enabled.

  • Microsoft Azure, Google Cloud: The Veza service principal for the cloud provider integration must be assigned to access the account via SSO as an account admin.

  • AWS: Veza uses OAuth credentials for a Databricks service account with the Account admin role

  • The Veza service principal must be assigned as an admin on all workspaces to fully discover all sub-resources.

  • Single Sign-On (Unified Login) enabled for the workspaces to discover. Unified Login is always enabled on Google Cloud and Microsoft Azure, but is optional for AWS deployments.

  • A dedicated cluster for running extraction queries (see below).

  • Administrator access to Databricks to create a Veza service principal and cluster.

Configure a Veza service principal

The integration requires a Databricks service principal with account admin privileges (required to list all workspace entities and permissions):

  • Databricks on AWS: OAuth2 token for a Databricks service account (M2M access).

  • Databricks on Google Cloud Platform: Google Service Account configured for the Google Cloud Platform integration.

  • Databricks on Microsoft Azure: Azure App Integration configured for Azure discovery.

OAuth M2M for AWS

To create a service principal for Databricks on AWS, log in to the Databricks account console as an administrator:

  1. Go to User management.

  2. Under Service principals, click Add service principal.

  3. Enter a name and click Add.

  4. On the Roles tab, enable Account admin to enable account-level API calls.

Assign your service principal to identity federated workspaces.

  1. Open Workspaces and click your workspace name.

  2. Go to Permissions > Add permissions.

  3. Search for the user, Assign the Admin permission level and save the changes. Get an OAuth client secret:

To create an OAuth secret for a service principal using the account console:

  1. In the Databricks account console, open User management.

  2. On the Service principals tab, find the service principal.

  3. Under OAuth secrets, click Generate secret.

  4. Copy the Secret and Client ID, and then click Done.

For more details see OAuth M2M in the Databricks documentation.

Create a Databricks cluster

Veza will run SQL queries on a Databricks cluster to collect metadata. Veza recommends a dedicated cluster for this purpose.

  • You will identify the cluster by tag when configuring the integration.

To create a cluster from the Databricks UI, pick Create > Cluster:

  • The cluster can be a small single-node cluster

  • You should enable termination after an inactivity period (~10 minutes). The cluster will automatically start for extractions, and stop automatically when inactive.

  • Enable spark.databricks.acl.sqlOnly true under Advanced Options > Spark > Spark config

  • Ensure the Veza service principal has CAN_MANAGE permission on the cluster (More > Permissions).

For more details on creating Databricks clusters see here.

Assign the Veza user to Databricks workspaces

The Veza service principal must be a workspace-level administrator to discover Workspaces subresources such as notebooks and clusters. Without admin permissions, the integration will not be able to gather metadata for the workspace.

To add the Veza service principal to a workspace with the admin role:

A) Using the Databricks account admin console (for workspaces with identity federation):

  • Open Workspaces and click your workspace name.

  • Go to Permissions > Add permissions.

  • Search for the user, Assign the Admin permission level and save the changes.

B) Using the Workspace admin console:

  1. Click your username in the top bar of the Databricks workspace and select Admin Settings.

  2. Open Identity and access.

  3. Go to Users > Manage > Add User.

  4. Click Add new to create a new user and enter the email of the Veza service account.

  5. Click Add.

  6. On the list of users, click the user.

  7. Click the Entitlements tab.

  8. Click the toggle next to Admin access.

See Manage Users for more detail.

Enable Databricks extraction

Databricks extraction is disabled by default. To enable the service, edit the AWS, Google Cloud Platform, or Microsoft Azure integration:

  1. Go to the Veza Integrations page.

  2. Find the integration on the list and click Edit.

  3. In the third section Limit Services, tick Limit {Integration} Services

  4. Click Select All to enable all services, or tick the boxes for services your company uses. Tick the box next to Databricks.

  5. Go to the Details section.

  6. Enter the additional fields:

    • Databricks account ID: Databricks account id

    • Databricks collector cluster tag: Cluster tag for running queries. If empty, Veza will use the first available cluster.

    • AWS: Databricks OAuth M2M client ID: Client ID for OAuth M2M

    • AWS: Databricks OAuth M2M client secret: Veza service principal client secret

  7. Click Save Integration to enable the connection.

Last updated