Limiting Extractions
Options for restricting data source extractions
When connecting to a configured identity or data provider, Veza will attempt to discover all supported resources by default. There are two methods to limit the services and resources discovered:
Toggle discovery of select services (skipping services such AWS KMS or Azure SQL entirely)
Set allow and deny lists to limit data sources by name (only parsing individual resources)
Selecting services or resources to limit can be desirable to:
Omit unnecessary data sources following a naming pattern (such as
test-db-*
)Prevent connection errors (for example if you haven't yet created a required local database user)
Improve overall performance by limiting the overall number of graph Entities.
Ingest services one-by-one during initial parsing to incrementally update, instead of running a single long extraction
You can enable these preferences when adding a new provider, or change them for an existing integration by finding the provider in the Configuration menu and clicking the "Edit" button.
To toggle services discovered, choose Select services to enable in the provider configuration. When you save your changes, only the selected services will be scanned and added to the data catalog.
Allow or deny data sources
You can set allow and deny lists to limit extraction by resource name (including wildcards). Allow/deny lists are available for most data sources, including Google Cloud projects/domains, AWS Redshift/RDS databases, S3 buckets, and Snowflake databases.
When an allow list is saved, only resources with a matching name are parsed and added to the Identity Data Entities catalog. If a deny list is configured, any data sources with a matching name will be ignored during discovery.
The following rules apply:
If no values are provided, all data sources are extracted
If a resource name matches the allow list, it will be extracted
If a resource name matches the deny list, it will be ignored
Resources are only extracted if allowed and not denied (in the case that both allow and deny lists are configured)
Lists can have any number of wildcards (*
), matching any number of characters.
Naming conventions
The value to use as the resource name
depends on the provider. See the table below for more information about the format:
To retrieve these values for an entity that has already been parsed:
Search for the entity using the Authorization Graph,
Click the node to open the actions sidebar, and choose "Show Details"
The name to use will be one of the entity properties
You can also see the complete metadata for entities in your data catalog by opening the Analytics page, selecting a data provider, opening the results in Query Builder.
Azure settings
When modifying an Azure tenant configuration, several additional options are available:
gather_guest_users
Whether to parse identity metadata for Azure AD Guest users
gather_disabled_users
Whether to include disabled users
domains
Comma-separated list of AD domains to discover, ignoring any others
gather_personal_sites
Whether to include personal SharePoint sites
Last updated