All pages
Powered by GitBook
1 of 1

Loading...

Limiting Extractions

Options for restricting data source extractions

When connecting to a configured identity or data provider, Veza will attempt to discover all supported resources by default. There are two methods to limit the services and resources discovered:

  • Toggle discovery of select services (skipping services such AWS KMS or Azure SQL entirely)

  • Set allow and deny lists to limit data sources by name (only parsing individual resources)

Selecting services or resources to limit can be desirable to:

  • Omit unnecessary data sources following a naming pattern (such as test-db-*)

  • Prevent connection errors (for example if you haven't yet created a required local database user)

You can enable these preferences when adding a new provider, or change them for an existing integration by finding the provider in the Configuration menu and clicking the "Edit" button.

To toggle services discovered, choose Select services to enable in the provider configuration. When you save your changes, only the selected services will be scanned and added to the data catalog.

Allow or deny data sources

You can set allow and deny lists to limit extraction by resource name (including wildcards). Allow/deny lists are available for most data sources, including Google Cloud projects/domains, AWS Redshift/RDS databases, S3 buckets, and Snowflake databases.

When an allow list is saved, only resources with a matching name are parsed and added to the Identity Data Entities catalog. If a deny list is configured, any data sources with a matching name will be ignored during discovery.

Note: When adding resource names to allowlist or denylist fields, enclose names containing spaces or special characters in double quotes. For example: "My Database (Production)", "Test Environment #1". Names without spaces or special characters do not require quotes.

The following rules apply:

  • If no values are provided, all data sources are extracted

  • If a resource name matches the allow list, it will be extracted

  • If a resource name matches the deny list, it will be ignored

  • Resources are only extracted if allowed and not denied (in the case that both allow and deny lists are configured)

Lists can have any number of wildcards (*), matching any number of characters.

Naming conventions

The value to use as the resource name depends on the provider. See the table below for more information about the format:

To retrieve these values for an entity that has already been parsed:

  1. Search for the entity using the ,

  2. Click the node to open the actions sidebar, and choose "Show Details"

Azure settings

When modifying an Azure tenant configuration, several additional options are available:

Improve overall performance by limiting the overall number of graph Entities.
  • Ingest services one-by-one during initial parsing to incrementally update, instead of running a single long extraction

  • The name to use will be one of the entity properties

    You can also see the complete metadata for entities in your data catalog by opening the Overview page, selecting an integration type, and clicking an entity type to view results in Query Builder.

    AWS Redshift database

    Database ARN, for example: arn:aws:redshift:region:account-id:cluster:cluster-name

    AWS RDS database

    RDS database name

    AWS S3 bucket

    S3 bucket name

    Google Cloud project

    Project id

    Google BiqQuery

    Dataset name, table name

    SQL Server

    Database / Schema name

    Snowflake

    Snowflake dbname

    gather_guest_users

    Whether to parse identity metadata for Azure AD Guest users

    gather_disabled_users

    Whether to include disabled users

    domains

    Comma-separated list of AD domains to discover, ignoring any others

    gather_personal_sites

    Whether to include personal SharePoint sites

    Access Graph