# Databricks: Entities and Permissions Reference

This document provides reference information about the entity types, attributes, and permissions that Veza discovers from Databricks integrations. This information applies to both [workspace-level](/4yItIzMvkpAvMVFAamTf/integrations/integrations/databricks-single-workspace.md) and [Unity Catalog](/4yItIzMvkpAvMVFAamTf/integrations/integrations/databricks-unity-catalog.md) integrations.

## Integration Architecture

Within Databricks, Access Control Lists (ACLs) govern permissions to different entities such as catalogs, schemas, tables, clusters, directories, and notebooks.

Veza discovers entity and authorization metadata using the native Databricks REST API and executes SQL queries on designated clusters to extract table-level metadata and permissions.

**Workspace Mode:** Connects directly to individual workspaces using personal access token authentication. This is configured as a standalone Databricks integration. The integration supports workspace resources and the `hive_metastore` catalog.

**Unity Catalog Mode:** Connects at the account level using OAuth or SSO. This is configured as part of your cloud provider integration ([AWS](/4yItIzMvkpAvMVFAamTf/integrations/integrations/aws.md), [Azure](/4yItIzMvkpAvMVFAamTf/integrations/integrations/azure.md), or [GCP](/4yItIzMvkpAvMVFAamTf/integrations/integrations/google.md)) (not as a standalone integration). When enabled, Veza discovers account-level resources, multiple workspaces, Unity Catalog metastores, and all associated resources.

In Databricks Unity Catalog, Users, Service Principals, Groups, and Catalogs exist at both account and workspace levels, with account-level identities linked to their workspace-level counterparts.

## Supported Entities

### Identity Entities

#### User

Local workspace user account. Users can be members of groups, own resources, and have permissions granted directly or through group membership.

**Attributes:**

* `email`
* `emails`
* `system_id`
* `is_disabled`
* `identity_unique_id`
* `identity_type`
* `is_active`
* `roles` (Unity Catalog)
* `indirect_roles` (Unity Catalog)

#### Service Principal

Non-human identity used for programmatic access and automation. Service principals can authenticate to Databricks and have permissions managed similarly to users.

**Attributes:**

* `application_id`
* `system_id`
* `is_disabled`
* `identity_unique_id`
* `identity_type`
* `is_active`
* `roles` (Unity Catalog)
* `indirect_roles` (Unity Catalog)

#### Group

Group for organizing users and service principals. Groups can contain users, service principals, and other groups (nested membership).

**Attributes:**

* `identity_unique_id`
* `roles` (Unity Catalog)
* `indirect_roles` (Unity Catalog)

#### Personal Access Token

Authentication token owned by a user or service principal. PATs provide programmatic access to Databricks with the same permissions as the owning identity.

**Attributes:**

* `token_id`
* `created_by_id`
* `created_by_username`
* `created_at`
* `expires_at`
* `owner_id`

**Notes:**

* Personal Access Tokens automatically inherit all permissions from their owning identity
* Token discovery requires workspace admin permissions and may not be available on all Databricks pricing tiers
* Expired tokens are excluded from discovery

### Container Entities

#### Account (Unity Catalog)

Top-level container for Unity Catalog resources including metastores, workspaces, account-level identities, and OAuth applications.

**Attributes:**

* `account_id`

#### Workspace

Container for workspace resources including users, groups, clusters, directories, notebooks, and catalogs.

**Attributes:**

* `sso_id`
* `sso_type`
* `metastore_id`

#### Metastore (Unity Catalog)

Unity Catalog metastore that can be shared across multiple workspaces and contains catalogs.

**Attributes:**

* `metastore_id`
* `region`

### Data Entities

#### Catalog

Database catalog containing schemas.

**Attributes:**

* `catalog_type`
* `metastore_id`
* `isolation_mode`
* `workspace_hosts`

**Notes:**

* Workspace mode only discovers the `hive_metastore` catalog
* Hive metastores do not support catalog-level permission grants
* Unity Catalog mode discovers all catalogs within attached metastores

#### Schema

Database schema within a catalog, containing tables and views.

**Attributes:**

* `catalog_name`
* `workspace_host`
* `metastore_id`

#### Table

Data table within a schema.

**Attributes:**

* `Name`
* `ID`

#### View

Database view within a schema. Views are virtual tables based on queries, with independent permissions that control who can query them.

**Attributes:**

* `Name`
* `ID`

### Workspace Objects

#### Directory

Workspace folder that can contain subdirectories and notebooks.

**Attributes:**

* `depth`
* `path`
* `top_datasource_level`
* `workspace_host`

#### Notebook

Executable code document that can be attached to a cluster. Notebooks contain code, visualizations, and documentation.

**Attributes:**

* `depth`
* `path`
* `workspace_host`

### Compute Resources

#### Cluster

Compute cluster (set of Spark computation resources) used to run notebooks and jobs.

**Attributes:**

* `local_disk_encryption_enabled`
* `autotermination_minutes`
* `creator_user_name`
* `num_workers`
* `spark_version`
* `runtime_engine`

#### Job

Automated workflow for running data engineering, data science, and analytics workloads.

**Attributes:**

* `job_id`
* `creator_user_name`
* `created_time`
* `effective_budget_policy_id`
* `run_as_user`
* `run_as_service_principal`
* `schedule_status`
* `schedule_expression`
* `max_concurrent_runs`
* `format`
* `tags`

#### Pipeline

Delta Live Tables pipeline for building reliable data processing pipelines with automatic testing and monitoring.

**Attributes:**

* `pipeline_id`
* `cluster_id`
* `creator_user_name`
* `run_as_user_name`
* `state`
* `health`

### Security Entities (Unity Catalog)

#### Account Service Principal Secret (Unity Catalog)

OAuth secret for account-level service principals.

**Attributes:**

* `service_principal_app_id`
* `secret_status`
* `secret_name`
* `created_at`
* `updated_at`
* `expires_at`

#### Account Service Principal Federation Policy (Unity Catalog)

OIDC federation policy for a specific account service principal, enabling workload identity federation.

**Attributes:**

* `service_principal_app_id`
* `policy_name`
* `policy_description`
* `policy_uid`
* `created_at`
* `updated_at`
* `oidc_policy_issuer`
* `oidc_policy_audiences`
* `oidc_policy_subject`
* `oidc_policy_subject_claim`

#### Account Federation Policy (Unity Catalog)

Account-level OIDC federation policy that can be referenced by multiple identities.

**Attributes:**

* `policy_name`
* `policy_description`
* `policy_uid`
* `created_at`
* `updated_at`
* `oidc_policy_issuer`
* `oidc_policy_audiences`
* `oidc_policy_subject`
* `oidc_policy_subject_claim`

### OAuth Applications (Unity Catalog)

#### Published OAuth App Integration (Unity Catalog)

Pre-configured Databricks-provided OAuth application.

**Attributes:**

* `app_id`
* `integration_id`
* `app_name`
* `created_at`
* `created_by_id`
* `access_token_ttl_minutes`
* `refresh_token_ttl_minutes`

#### Custom OAuth App Integration (Unity Catalog)

User-created OAuth application with custom configuration.

**Attributes:**

* `client_id`
* `integration_id`
* `app_name`
* `created_at`
* `created_by_id`
* `creator_username`
* `confidential`
* `redirect_urls`
* `scopes`
* `user_authorized_scopes`
* `access_token_ttl_minutes`
* `refresh_token_ttl_minutes`

## Permissions and Effective Access

Databricks uses Access Control Lists (ACLs) to control access to different types of resources. Veza models these native privileges and generates **Effective Permissions**, showing the cumulative access granted through:

* Direct privilege assignments to users or service principals
* Group membership (including nested groups)
* Personal Access Token inheritance from owning identity
* Directory and notebook permission inheritance

### Permission Types by Resource

#### Data Object Permissions

Permissions that apply to Catalogs, Schemas, Tables, and Views:

| Permission              | Description                                            |
| ----------------------- | ------------------------------------------------------ |
| `CREATE`                | Create new objects within the container                |
| `CREATE_NAMED_FUNCTION` | Create named functions                                 |
| `CREATE TABLE`          | Create tables within a schema                          |
| `MODIFY`                | Modify existing object definitions                     |
| `MODIFY_CLASSPATH`      | Modify classpath settings                              |
| `READ_METADATA`         | Read object metadata and definitions                   |
| `READ FILES`            | Read data files                                        |
| `SELECT`                | Query data from tables and views                       |
| `USAGE`                 | Use the object in queries (required for schema access) |
| `WRITE FILES`           | Write data files                                       |
| `OWN`                   | Full ownership with all permissions                    |

#### Directory and Notebook Permissions

| Permission   | Description                                         |
| ------------ | --------------------------------------------------- |
| `CAN_READ`   | View the directory or notebook contents             |
| `CAN_RUN`    | Execute the notebook                                |
| `CAN_EDIT`   | Modify the directory or notebook                    |
| `CAN_MANAGE` | Full management rights including permission changes |

#### Cluster Permissions

| Permission      | Description                                                     |
| --------------- | --------------------------------------------------------------- |
| `CAN_ATTACH_TO` | Attach notebooks to the cluster and use for execution           |
| `CAN_RESTART`   | Restart the cluster                                             |
| `CAN_MANAGE`    | Full cluster management including configuration and permissions |

#### Job Permissions

| Permission       | Description                            |
| ---------------- | -------------------------------------- |
| `CAN_VIEW`       | View job configuration and run history |
| `CAN_MANAGE_RUN` | Trigger and cancel job runs            |
| `CAN_MANAGE`     | Modify job configuration               |
| `IS_OWNER`       | Full ownership with all rights         |

#### Pipeline Permissions

| Permission   | Description                            |
| ------------ | -------------------------------------- |
| `CAN_VIEW`   | View pipeline configuration and status |
| `CAN_RUN`    | Trigger pipeline runs                  |
| `CAN_MANAGE` | Modify pipeline configuration          |
| `IS_OWNER`   | Full ownership with all rights         |

## Unsupported Entities

The following Databricks entity types are **not** currently supported:

* SQL Warehouse
* Experiment
* Cluster Pool
* Query
* Dashboard


---

# Agent Instructions: Querying This Documentation

If you need additional information that is not directly available in this page, you can query the documentation dynamically by asking a question.

Perform an HTTP GET request on the current page URL with the `ask` query parameter:

```
GET https://docs.veza.com/4yItIzMvkpAvMVFAamTf/integrations/integrations/databricks-unity-catalog/databricks-info.md?ask=<question>
```

The question should be specific, self-contained, and written in natural language.
The response will contain a direct answer to the question and relevant excerpts and sources from the documentation.

Use this mechanism when the answer is not explicitly present in the current page, you need clarification or additional context, or you want to retrieve related documentation sections.
