> For the complete documentation index, see [llms.txt](https://docs.veza.com/4yItIzMvkpAvMVFAamTf/llms.txt). Markdown versions of documentation pages are available by appending `.md` to page URLs; this page is available as [Markdown](https://docs.veza.com/4yItIzMvkpAvMVFAamTf/features/lifecycle-management/how-to/duplicate-resolution.md).

# Duplicate Resolution Rules

A Duplicate Resolution Rule consolidates multiple records for the same person within a single source of identity into one authoritative record. The rule applies during workflow processing, so policy workflows act on a single resolved identity rather than firing once per duplicate.

This is useful for upstream systems that briefly emit more than one record for the same individual. The most common case is a contractor-to-employee conversion in an HRIS, where the legacy contractor record and the new employee record can coexist for a short period before the contractor record is closed.

Rules can be configured in the Policy Settings UI or via the [policy update API](/4yItIzMvkpAvMVFAamTf/developers/api/lifecycle-management/policies/updatepolicyconfiguration.md).

## When to use a rule

Apply a Duplicate Resolution Rule when a single source of identity can produce more than one identity record for the same person, and you want a policy to act on only one. Typical situations:

* **Contractor-to-employee conversion**: an HRIS keeps the open contractor record while creating a new employee record for the same person.
* **System-generated duplicates**: a SOI emits a placeholder record alongside the real one during onboarding.
* **Merged data sources**: a CSV upload or custom OAA connector merges feeds that share individuals.

A rule resolves duplicates within **one source of identity at a time**. It does not match a person across two different sources. That correlation is handled by the policy's primary and secondary source configuration. See [Identities](/4yItIzMvkpAvMVFAamTf/features/lifecycle-management/identities-overview/identities.md) for cross-source behavior.

## How resolution works

For each extraction of the source of identity:

1. Veza groups candidate identities that share the same values for the configured **deduplication key properties**.
2. For each group, Veza applies the **ranking rules** to choose the authoritative record. If no ranking rules are defined, the record with the smallest entity ID wins as a tie-breaker.
3. The non-authoritative records in the group are permanently deleted (hard-deleted, not soft-deleted) before workflows evaluate the resulting identity. Each deletion emits an `IDENTITY_DELETED_FROM_SOURCES` event, so event consumers and notification templates see one event per dropped record on the first extraction after a rule is enabled or changed. Unlike a record removed by normal LCM offboarding, a duplicate loser cannot be restored on rehire — it is gone from Veza entirely on the next extraction.
4. Policy workflows process the single authoritative record.

Groups larger than the configured `max_duplicate_group_size` are skipped and logged. A group with many members almost always means the deduplication key points at a non-unique field; review the rule configuration if you see skipped groups.

{% hint style="warning" %}
Enabling a rule on an existing policy, or changing ranking rules so a different record wins, hard-deletes the non-authoritative identities and cascades to their downstream entities. The deleted records cannot be recovered through LCM's rehire flow. Test changes in a draft policy first.
{% endhint %}

## Rule configuration

The rule lives on the policy:

* `primary_duplicate_resolution_rule` on the `Policy` for the primary source of identity.
* `duplicate_resolution_rule` on each entry in `secondary_source_of_identities` for secondary sources.

An empty `{}` is a no-op. Validation rejects rules that set `ranking_rules` or `max_duplicate_group_size` without `dedup_key_properties`.

### Fields

| Field                      | Type                   | Description                                                                                                                                                                                        |
| -------------------------- | ---------------------- | -------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- |
| `dedup_key_properties`     | array of strings       | Property names whose combined values uniquely identify a person. The list is AND-combined (all properties must match for two records to be considered duplicates). List order does not matter.     |
| `ranking_rules`            | array of `RankingRule` | Ordered list of rules used to choose the authoritative record within a group. The first rule is the primary sort, the second is applied to ties, and so on. If empty, the smallest entity ID wins. |
| `max_duplicate_group_size` | integer                | Upper bound on group size that Veza will resolve. Range: 2-20. Default: 3. Groups larger than this are skipped and logged.                                                                         |

#### RankingRule

| Field         | Type   | Description                                                                                                                                                                                       |
| ------------- | ------ | ------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- |
| `scim_filter` | string | SCIM filter expression. Candidates matching this filter are ranked higher than those that do not. The filter sees only the candidate identity itself, not related entities.                       |
| `order_by`    | string | Property name used as a secondary sort within the `scim_filter` match group. Supports string, number, bool, and lists of strings. If `scim_filter` is empty, `order_by` becomes the primary sort. |
| `descending`  | bool   | If true, `order_by` sorts descending.                                                                                                                                                             |

### Matching behavior

Property matching for `dedup_key_properties` is exact and type-aware:

* Strings compared byte-for-byte (case- and whitespace-sensitive).
* Type mismatches do not match. The string `"1"` and the number `1` are treated as different values.
* List and struct element order matters. `[1, 2]` does not match `[2, 1]`.

## Examples

### Match on a single field, prefer active records

Group identities that share an `email`. Within each group, prefer records where `is_active` is `true`; among equally active records, prefer the most recently hired.

```json
{
  "primary_duplicate_resolution_rule": {
    "dedup_key_properties": ["email"],
    "ranking_rules": [
      { "scim_filter": "is_active eq true" },
      { "order_by": "hire_date", "descending": true }
    ],
    "max_duplicate_group_size": 3
  }
}
```

### Stack multiple ranking criteria

Group identities that share an `employee_id`. Within each group, prefer active records; among active records, prefer employees over contractors; among active employees, prefer the most recently hired. Each ranking rule applies only to ties from the previous rule.

```json
{
  "primary_duplicate_resolution_rule": {
    "dedup_key_properties": ["employee_id"],
    "ranking_rules": [
      { "scim_filter": "is_active eq true" },
      { "scim_filter": "worker_type eq \"Employee\"" },
      { "order_by": "hire_date", "descending": true }
    ],
    "max_duplicate_group_size": 3
  }
}
```

### Match on a composite of two fields

Group identities that share both `employee_id` and `country`. Use the default ranking (smallest entity ID wins on ties).

```json
{
  "primary_duplicate_resolution_rule": {
    "dedup_key_properties": ["employee_id", "country"],
    "ranking_rules": [],
    "max_duplicate_group_size": 3
  }
}
```

### Clear an existing rule

PATCH cannot send `null` for this field (a null is treated as "field not in mask"). To clear an existing rule, send an empty object:

```json
{
  "primary_duplicate_resolution_rule": {}
}
```

## API reference

* [Update Policy Configuration](/4yItIzMvkpAvMVFAamTf/developers/api/lifecycle-management/policies/updatepolicyconfiguration.md): PATCH or PUT the policy version, including `primary_duplicate_resolution_rule` and per-secondary `duplicate_resolution_rule`.
* [Policies](/4yItIzMvkpAvMVFAamTf/features/lifecycle-management/policies-workflows/policies.md): policy structure and lifecycle.
* [Identities](/4yItIzMvkpAvMVFAamTf/features/lifecycle-management/identities-overview/identities.md): identity behavior, including primary and secondary source of identity reconciliation.


---

# Agent Instructions
This documentation is published with GitBook. GitBook is the documentation platform designed so that both humans and AI agents can read, navigate, and reason over technical content effectively. Learn more at gitbook.com.

## Querying This Documentation
If you need additional information that is not directly available in this page, you can query the documentation dynamically by asking a question.

Perform an HTTP GET request on the current page URL with the `ask` query parameter:

```
GET https://docs.veza.com/4yItIzMvkpAvMVFAamTf/features/lifecycle-management/how-to/duplicate-resolution.md?ask=<question>
```

The question should be specific, self-contained, and written in natural language.
The response will contain a direct answer to the question and relevant excerpts and sources from the documentation.

Use this mechanism when the answer is not explicitly present in the current page, you need clarification or additional context, or you want to retrieve related documentation sections.
