Options for restricting data source extractions
When connecting to a configured identity or data provider, Veza will attempt to discover all supported resources by default. There are two methods to limit the services and resources discovered:
Toggle discovery of select services (skipping services such AWS KMS or Azure SQL entirely)
Set allow and deny lists to limit data sources by name (only parsing individual resources)
Selecting services or resources to limit can be desirable to:
Omit unnecessary data sources following a naming pattern (such as test-db-*)
Prevent connection errors (for example if you haven't yet created a required local database user)
You can enable these preferences when adding a new provider, or change them for an existing integration by finding the provider in the Configuration menu and clicking the "Edit" button.
To toggle services discovered, choose Select services to enable in the provider configuration. When you save your changes, only the selected services will be scanned and added to the data catalog.
You can set allow and deny lists to limit extraction by resource name (including wildcards). Allow/deny lists are available for most data sources, including Google Cloud projects/domains, AWS Redshift/RDS databases, S3 buckets, and Snowflake databases.
When an allow list is saved, only resources with a matching name are parsed and added to the Identity Data Entities catalog. If a deny list is configured, any data sources with a matching name will be ignored during discovery.
Note: When adding resource names to allowlist or denylist fields, enclose names containing spaces or special characters in double quotes. For example:
"My Database (Production)", "Test Environment #1". Names without spaces or special characters do not require quotes.
The following rules apply:
If no values are provided, all data sources are extracted
If a resource name matches the allow list, it will be extracted
If a resource name matches the deny list, it will be ignored
Resources are only extracted if allowed and not denied (in the case that both allow and deny lists are configured)
Lists can have any number of wildcards (*), matching any number of characters.
The value to use as the resource name depends on the provider. See the table below for more information about the format:
To retrieve these values for an entity that has already been parsed:
Search for the entity using the ,
Click the node to open the actions sidebar, and choose "Show Details"
When modifying an Azure tenant configuration, several additional options are available:
Ingest services one-by-one during initial parsing to incrementally update, instead of running a single long extraction
The name to use will be one of the entity properties
You can also see the complete metadata for entities in your data catalog by opening the Overview page, selecting an integration type, and clicking an entity type to view results in Query Builder.
AWS Redshift database
Database ARN, for example: arn:aws:redshift:region:account-id:cluster:cluster-name
AWS RDS database
RDS database name
AWS S3 bucket
S3 bucket name
Google Cloud project
Project id
Google BiqQuery
Dataset name, table name
SQL Server
Database / Schema name
Snowflake
Snowflake dbname
gather_guest_users
Whether to parse identity metadata for Azure AD Guest users
gather_disabled_users
Whether to include disabled users
domains
Comma-separated list of AD domains to discover, ignoring any others
gather_personal_sites
Whether to include personal SharePoint sites