# Limiting Extractions

When connecting to a configured identity or data provider, Veza will attempt to discover all supported resources by default. There are two methods to limit the services and resources discovered:

* Toggle discovery of select services (skipping services such AWS KMS or Azure SQL entirely)
* Set allow and deny lists to limit data sources by name (only parsing individual resources)

{% hint style="success" %}
Selecting services or resources to limit can be desirable to:

* Omit unnecessary data sources following a naming pattern (such as `test-db-*`)
* Prevent connection errors (for example if you haven't yet created a required local database user)
* Improve overall performance by limiting the overall number of graph [Entities](/4yItIzMvkpAvMVFAamTf/features/search/graph.md).
* Ingest services one-by-one during initial parsing to incrementally update, instead of running a single long extraction
  {% endhint %}

You can enable these preferences when adding a new provider, or change them for an existing integration by finding the provider in the **Configuration** menu and clicking the "Edit" button.

To toggle services discovered, choose *Select services to enable* in the provider configuration. When you save your changes, only the selected services will be scanned and added to the data catalog.

## Limit services

When adding or editing an integration, click **Limit Services** and select which services to enable. Only the selected services will be extracted.

Some services require additional configuration, API enablement, or expanded integration permissions before they can be extracted. The integration guide for each provider notes any prerequisites. If a service requires extra setup that hasn't been completed, disabling it here prevents extraction errors.

When limiting services, you can also restrict the integration's credentials to only the permissions needed for the selected services. This achieves a least-privilege configuration and reduces the policy's blast radius.

### Amazon Web Services (AWS)

The following services can be enabled or disabled when configuring an AWS integration. Core services (IAM, STS, and SSO/Identity Center) are always extracted and cannot be disabled.

| Service                    | Entities                                               |
| -------------------------- | ------------------------------------------------------ |
| Amazon Bedrock             | Agents, foundation models, knowledge bases, guardrails |
| Amazon Certificate Manager | Certificates                                           |
| Amazon Cognito             | Identity pools                                         |
| Amazon DocumentDB          | Clusters, databases, users, roles                      |
| Amazon DynamoDB            | Tables, streams, secondary indexes                     |
| Amazon EC2                 | Instances, VPCs, security groups                       |
| Amazon ECR                 | Private and public repositories                        |
| Amazon EKS                 | Clusters                                               |
| Amazon EMR                 | Clusters, studios, notebook executions                 |
| Amazon Neptune             | Clusters, instances                                    |
| Amazon Organizations       | Organizational units, accounts, SCPs, RCPs             |
| Amazon RDS                 | Clusters, instances                                    |
| Amazon RDS MySQL           | Databases, tables, users, roles                        |
| Amazon RDS Oracle          | Instances, tenant databases                            |
| Amazon RDS PostgreSQL      | Databases, schemas, tables, users, groups              |
| Amazon Redshift            | Clusters, databases, users, groups                     |
| Amazon Redshift Cluster    | Databases, schemas, tables, users                      |
| Amazon S3                  | Buckets, bucket policies                               |
| Amazon Secrets Manager     | Secrets                                                |
| Amazon Systems Manager     | Parameters                                             |
| AWS Databricks             | Workspaces, clusters, users, groups                    |
| AWS KMS                    | Customer-managed keys                                  |
| AWS Lambda                 | Functions                                              |

### Google Cloud Platform (GCP)

The following services can be enabled or disabled when configuring a GCP integration. IAM and Google Workspace are always extracted and cannot be disabled.

| Service                            | Entities                               |
| ---------------------------------- | -------------------------------------- |
| Artifact Registry                  | Repositories, packages                 |
| BigQuery                           | Datasets, tables                       |
| Cloud Key Management Service (KMS) | Key rings, crypto keys                 |
| Cloud Run                          | Services, instances                    |
| Cloud SQL                          | Database instances, databases, users   |
| Cloud Storage                      | Buckets, objects, folders              |
| Compute Engine                     | VMs, VPCs, subnets, network interfaces |
| GCP Databricks                     | Workspaces, accounts, schemas          |
| Google Kubernetes Engine           | Clusters                               |
| Secret Manager                     | Secrets, secret versions               |
| Vertex AI                          | Models, endpoints, reasoning engines   |
| Workload Identity Federation       | Identity pools, providers              |

### Microsoft Azure

The following services can be enabled or disabled when configuring an Azure integration.

| Service                  | Entities                                      |
| ------------------------ | --------------------------------------------- |
| Azure AI Foundry         | Accounts, projects, agents, model deployments |
| Azure AI Services        | Accounts                                      |
| Azure AKS                | Kubernetes clusters                           |
| Azure Blob Storage       | Containers, blobs, immutability policies      |
| Azure Cosmos DB          | Accounts, databases, SQL roles                |
| Azure Data Lake          | Filesystems, directories, ACL permissions     |
| Azure Database           | MySQL, PostgreSQL, and MariaDB instances      |
| Azure Databricks         | Workspaces                                    |
| Azure Key Vault          | Keys, secrets, certificates                   |
| Azure PostgreSQL         | Flexible server instances                     |
| Azure Private Link       | Services, private endpoints                   |
| Azure SQL Server         | Servers, databases, failover groups           |
| Azure Storage            | Storage accounts, file shares, access keys    |
| Azure Virtual Machines   | VMs, virtual networks, security groups        |
| Exchange Online          | Mailboxes, distribution groups, role groups   |
| Microsoft Copilot Studio | Bots, topics, AI models, actions              |
| Azure Dynamics 365 CRM   | Environments, users, security roles           |
| Azure Dynamics 365 ERP   | Environments, users, security roles           |
| Microsoft Intune         | Managed devices, roles                        |
| Microsoft Teams          | Teams, channels, users                        |
| SharePoint               | Sites, libraries, lists, folders              |

{% hint style="info" %}
Enabling **Microsoft Copilot Studio** also requires enabling **Azure Dynamics 365 CRM** in the same Azure integration. "Azure Dynamics 365 CRM" is Microsoft's label for the Dataverse API. No Dynamics 365 license or CRM deployment is required.
{% endhint %}

## Allow or deny data sources

You can set allow and deny lists to limit extraction by resource name (including wildcards). Allow/deny lists are available for most data sources, including Google Cloud projects/domains, AWS Redshift/RDS databases, S3 buckets, and Snowflake databases.

When an allow list is saved, only resources with a matching name are parsed and added to the Identity Data Entities catalog. If a deny list is configured, any data sources with a matching name will be ignored during discovery.

> **Note:** When adding resource names to allowlist or denylist fields, enclose names containing spaces or special characters in double quotes. For example: `"My Database (Production)", "Test Environment #1"`. Names without spaces or special characters do not require quotes.

The following rules apply:

* If no values are provided, all data sources are extracted
* If a resource name matches the allow list, it will be extracted
* If a resource name matches the deny list, it will be ignored
* Resources are only extracted if allowed and not denied (in the case that both allow and deny lists are configured)

{% hint style="success" %}
Lists can have any number of wildcards (`*`), matching any number of characters.
{% endhint %}

## Naming conventions

The value to use as the resource name depends on the provider and resource type. The tables below document the exact format expected for each integration's allow and deny list fields.

{% hint style="success" %}
To retrieve these values for an entity that has already been parsed:

1. Search for the entity using the [Access Graph](/4yItIzMvkpAvMVFAamTf/features/search/graph.md),
2. Click the node to open the actions sidebar, and choose "Show Details"
3. The name to use will be one of the entity properties

You can also see the complete metadata for entities in your data catalog by opening the [Overview](/4yItIzMvkpAvMVFAamTf/features/insights/entities-overview.md) page, selecting an integration type, and clicking an entity type to view results in Query Builder.
{% endhint %}

### Amazon Web Services (AWS)

| Field                             | Value format                                                                                         | Example                                                           |
| --------------------------------- | ---------------------------------------------------------------------------------------------------- | ----------------------------------------------------------------- |
| S3 Bucket Allow/Deny List         | S3 bucket [name](https://docs.aws.amazon.com/AmazonS3/latest/userguide/bucketnamingrules.html)       | `my-data-bucket`                                                  |
| RDS Database Allow/Deny List      | RDS database [name](https://docs.aws.amazon.com/AmazonRDS/latest/UserGuide/Overview.DBInstance.html) | `production-db`                                                   |
| Redshift Database Allow/Deny List | Database ARN: `arn:aws:redshift:{region}:{account-id}:dbname:{cluster-name}/{database-name}`         | `arn:aws:redshift:us-east-1:123456789012:dbname:my-cluster/sales` |

### Microsoft Azure

| Field                                | Value format                                                                                                    | Example                                |
| ------------------------------------ | --------------------------------------------------------------------------------------------------------------- | -------------------------------------- |
| Subscription ID Allow/Deny List      | Azure [subscription ID](https://learn.microsoft.com/en-us/azure/azure-portal/get-subscription-tenant-id) (UUID) | `a1b2c3d4-e5f6-7890-abcd-ef1234567890` |
| Storage Account Name Allow/Deny List | Storage [account name](https://learn.microsoft.com/en-us/azure/storage/common/storage-account-overview)         | `mystorageaccount`                     |
| Blob Container Name Allow/Deny List  | Container name                                                                                                  | `my-container`                         |
| SQL Server Database Allow/Deny List  | Database name                                                                                                   | `SalesDB`                              |
| SQL Server Schema Allow/Deny List    | Schema name                                                                                                     | `dbo`                                  |
| PostgreSQL Database Allow/Deny List  | Database name                                                                                                   | `postgres`                             |
| PostgreSQL Schema Allow/Deny List    | Schema name                                                                                                     | `public`                               |
| SharePoint Site Allow/Deny List      | Site name or relative URL                                                                                       | `TeamSite` or `/sites/TeamSite`        |

### Google Cloud Platform (GCP)

| Field                            | Value format                                                                                                                                                                | Example               |
| -------------------------------- | --------------------------------------------------------------------------------------------------------------------------------------------------------------------------- | --------------------- |
| Project Allow/Deny List          | Project [ID](https://support.google.com/googleapi/answer/7014113)                                                                                                           | `my-project-123`      |
| Domain Allow/Deny List           | Google Workspace domain (the portion of email after `@`)                                                                                                                    | `acme.com`            |
| Location Allow/Deny List         | GCP [region or multi-region](https://cloud.google.com/about/locations)                                                                                                      | `us-central1` or `US` |
| BigQuery Dataset Allow/Deny List | Full dataset identifier: `{projectId}:{datasetId}` — enter only the `datasetId` portion (the name after the colon); use a wildcard to match across projects: `*:my_dataset` | `*:sales_data`        |
| BigQuery Table Allow/Deny List   | Table name only (not the full `project.dataset.table` path)                                                                                                                 | `transactions`        |

### Snowflake

| Field                    | Value format                                                                                                       | Example    |
| ------------------------ | ------------------------------------------------------------------------------------------------------------------ | ---------- |
| Database Allow/Deny List | Snowflake [database name](https://docs.snowflake.com/en/user-guide/snowsql-start.html#d-dbname) (case-insensitive) | `SALES_DB` |

### Okta

| Field                       | Value format                                                                      | Example                |
| --------------------------- | --------------------------------------------------------------------------------- | ---------------------- |
| Application Allow/Deny List | Okta application name (as shown in the Okta Admin Console under **Applications**) | `Salesforce` or `AWS*` |
| Domain Allow/Deny List      | The domain portion of user email addresses to include or exclude                  | `acme.okta.com`        |

### Salesforce

| Field                   | Value format                                                                                                       | Example                      |
| ----------------------- | ------------------------------------------------------------------------------------------------------------------ | ---------------------------- |
| Object Allow/Deny List  | Salesforce [object ID](https://help.salesforce.com/s/articleView?id=000385087) (for custom objects) or object name | `Account`, `CustomObject__c` |
| License Allow/Deny List | Salesforce license name (as shown under Setup → Company Information → User Licenses)                               | `Salesforce`, `Chatter Free` |

### GitHub

| Field                      | Value format                                                | Example                  |
| -------------------------- | ----------------------------------------------------------- | ------------------------ |
| Repository Allow/Deny List | Repository name (not the full `owner/repo` path)            | `my-repo` or `*-prod`    |
| Team Allow/Deny List       | GitHub team name (as shown in the organization's Teams tab) | `engineering` or `ops-*` |

### SQL Server (standalone)

| Field                    | Value format  | Example   |
| ------------------------ | ------------- | --------- |
| Database Allow/Deny List | Database name | `SalesDB` |
| Schema Allow/Deny List   | Schema name   | `dbo`     |

### MySQL

| Field                    | Value format           | Example      |
| ------------------------ | ---------------------- | ------------ |
| Database Allow/Deny List | Database (schema) name | `production` |

### PostgreSQL (standalone)

| Field                    | Value format  | Example  |
| ------------------------ | ------------- | -------- |
| Database Allow/Deny List | Database name | `myapp`  |
| Schema Allow/Deny List   | Schema name   | `public` |

### Trino

| Field                   | Value format       | Example             |
| ----------------------- | ------------------ | ------------------- |
| Catalog Allow/Deny List | Trino catalog name | `hive` or `iceberg` |
| Schema Allow/Deny List  | Schema name        | `default`           |
| Table Allow/Deny List   | Table name         | `transactions`      |

### Workday

| Field                          | Value format                                                  | Example                              |
| ------------------------------ | ------------------------------------------------------------- | ------------------------------------ |
| Worker Country Allow/Deny List | Country name as used in Workday's `workAddress_Country` field | `United States of America`, `Canada` |

## Azure settings

When modifying an Azure tenant configuration, several additional options are available:

| Setting                 | Description                                                         |
| ----------------------- | ------------------------------------------------------------------- |
| `gather_guest_users`    | Whether to parse identity metadata for Azure AD Guest users         |
| `gather_disabled_users` | Whether to include disabled users                                   |
| `domains`               | Comma-separated list of AD domains to discover, ignoring any others |
| `gather_personal_sites` | Whether to include personal SharePoint sites                        |


---

# Agent Instructions: Querying This Documentation

If you need additional information that is not directly available in this page, you can query the documentation dynamically by asking a question.

Perform an HTTP GET request on the current page URL with the `ask` query parameter:

```
GET https://docs.veza.com/4yItIzMvkpAvMVFAamTf/integrations/configuration/limits.md?ask=<question>
```

The question should be specific, self-contained, and written in natural language.
The response will contain a direct answer to the question and relevant excerpts and sources from the documentation.

Use this mechanism when the answer is not explicitly present in the current page, you need clarification or additional context, or you want to retrieve related documentation sections.
