Databricks (Unity Catalog)
Configuring the Veza integration for Databricks with Unity Catalog enabled.
If your organization uses Unity Catalog to federate access to Databricks workspaces, you can enable Databricks integration when configuring an AWS, GCP, or Microsoft Azure provider. Veza connects to your Databricks account to discover authorization metadata for all workspaces and resources the service principal can access. Veza also discovers account-level users and groups in Unity Catalog, and account-level Metastores shared with workspaces. Supported entities include:
Account-level:
Databricks Account
Databricks Account User
Databricks Account Service Principal
Databricks Account Service Group
Databricks Metastore
Workspace-level:
Databricks Catalog
Databricks Cluster
Databricks Notebook
Databricks Directory
Databricks Schema
Databricks Table
Databricks User
Databricks Group
For discovering single workspaces without Unity Catalog enabled, see Databricks.
Requirements
To enable the integration, you will need:
A Databricks account on the Premium plan, with SSO (Unified Login) enabled.
Microsoft Azure, Google Cloud: The Veza service principal for the cloud provider integration must be assigned to access the account via SSO as an account admin.
AWS: Veza uses OAuth credentials for a Databricks service account with the Account admin role
The Veza service principal must be assigned as an admin on all workspaces to fully discover all sub-resources.
Single Sign-On (Unified Login) enabled for the workspaces to discover. Unified Login is always enabled on Google Cloud and Microsoft Azure, but is optional for AWS deployments.
A dedicated cluster for running extraction queries (see below).
Administrator access to Databricks to create a Veza service principal and cluster.
Configure a Veza service principal
The integration requires a Databricks service principal with account admin privileges (required to list all workspace entities and permissions):
Databricks on AWS: OAuth2 token for a Databricks service account (M2M access).
Databricks on Google Cloud Platform: Google Service Account configured for the Google Cloud Platform integration.
Databricks on Microsoft Azure: Azure App Integration configured for Azure discovery.
OAuth M2M for AWS
To create a service principal for Databricks on AWS, log in to the Databricks account console as an administrator:
Go to User management.
Under Service principals, click Add service principal.
Enter a name and click Add.
On the Roles tab, enable Account admin to enable account-level API calls.
Assign your service principal to identity federated workspaces.
Open Workspaces and click your workspace name.
Go to Permissions > Add permissions.
Search for the user, Assign the
Admin
permission level and save the changes. Get an OAuth client secret:
To create an OAuth secret for a service principal using the account console:
In the Databricks account console, open User management.
On the Service principals tab, find the service principal.
Under OAuth secrets, click Generate secret.
Copy the Secret and Client ID, and then click Done.
For more details see OAuth M2M in the Databricks documentation.
Create a Databricks cluster
Veza will run SQL queries on a Databricks cluster to collect metadata. Veza recommends a dedicated cluster for this purpose.
You will identify the cluster by tag when configuring the integration.
To create a cluster from the Databricks UI, pick Create > Cluster:
The cluster can be a small single-node cluster
You should enable termination after an inactivity period (~10 minutes). The cluster will automatically start for extractions, and stop automatically when inactive.
Enable
spark.databricks.acl.sqlOnly true
under Advanced Options > Spark > Spark configEnsure the Veza service principal has
CAN_MANAGE
permission on the cluster (More > Permissions).
For more details on creating Databricks clusters see here.
Assign the Veza user to Databricks workspaces
The Veza service principal must be a workspace-level administrator to discover Workspaces subresources such as notebooks and clusters. Without admin permissions, the integration will not be able to gather metadata for the workspace.
To add the Veza service principal to a workspace with the admin role:
A) Using the Databricks account admin console (for workspaces with identity federation):
Open Workspaces and click your workspace name.
Go to Permissions > Add permissions.
Search for the user, Assign the
Admin
permission level and save the changes.
B) Using the Workspace admin console:
Click your username in the top bar of the Databricks workspace and select Admin Settings.
Open Identity and access.
Go to Users > Manage > Add User.
Click Add new to create a new user and enter the email of the Veza service account.
Click Add.
On the list of users, click the user.
Click the Entitlements tab.
Click the toggle next to Admin access.
See Manage Users for more detail.
Enable Databricks extraction
Databricks extraction is disabled by default. To enable the service, edit the AWS, Google Cloud Platform, or Microsoft Azure integration:
Go to the Veza Integrations page.
Find the integration on the list and click Edit.
In the third section Limit Services, tick Limit {Integration} Services
Click Select All to enable all services, or tick the boxes for services your company uses. Tick the box next to Databricks.
Go to the Details section.
Enter the additional fields:
Databricks account ID: Databricks account id
Databricks collector cluster tag: Cluster tag for running queries. If empty, Veza will use the first available cluster.
AWS: Databricks OAuth M2M client ID: Client ID for OAuth M2M
AWS: Databricks OAuth M2M client secret: Veza service principal client secret
Click Save Integration to enable the connection.
Last updated