How to Plan and Run a Rebuild Match Table in Reltio

Overview

Rebuild Match Table recalculates match tokens and match documents for selected entity types. It’s typically required after you introduce or change match rules or after cleaning up token issues. The task supports distributed execution and can be tuned with taskPartsCount; it supports a quiet mode to skip match events if you don’t want downstream traffic during maintenance.

When should you run it?

  • After enabling or modifying match rules (Organization, Location, Person, etc.).

  • After removing over-collisioned tokens (see below), because that operation requires a follow-up rebuild to repopulate MATCH tables.

 

Pre-flight checklist

  1. Analyze your match rules
    Run Analyze Match Strategy (Console) or use Match Rule Analyzer v2 to detect token explosion, heavy comparators, or high-collision tokens. Tuning before a rebuild often saves days of runtime.

  2. Deal with over-collisioned tokens (if present)
    Run RemoveOvercollisionedTokens first, then plan the rebuild. The task resets the token state and fixes the MATCH tables, but it explicitly requires a subsequent rebuild.

  3. Decide on your event policy
    If this is maintenance and you don’t need re-emitted match events, set maintenanceOptions=skipMatchEvents. This refreshes internal match structures without generating match events to external queues. 

  4. Determine when this task will be initiated. Avoid competing workloads that starve the rebuild job or amplify environment impact.

    • Avoid peak data loads, bulk updates, and performance tests.

    • Avoid concurrent large tasks on other big tenants in the same environment.

 

Capacity & concurrency planning

  • Distributed mode: set distributed=true to split the job into sub-tasks that can be processed in parallel across available API nodes. Control the degree of parallelism via taskPartsCount. Effective throughput depends on the number of available nodes and on other tasks running in the environment.

  • Avoid environmental contention: starting many resource-intensive tasks simultaneously across multiple tenants can reduce throughput or cause tasks to bounce between SCHEDULED and PROCESSING when nodes are under pressure. Use the Tasks API to observe global load.

Practical starting point

  • Large tenants: run distributed=true with a conservative taskPartsCount (well below your tenant’s ceiling)Increase only if throughput and task stability appear healthy.


Runbook

Option A — Console (Tenant Management)

  • Go to Console → Tenant Management → Jobs → Rebuild Match Table. Select entity type(s), choose distributed mode and parts, set skipMatchEvents if desired, and start.

Option B — API (recommended for automation)

Endpoint

POST {ApplicationURL}/api/{tenant_id}/rebuildmatchtable

Common query parameters

  • entityType=configuration/entityTypes/{EntityType}

  • distributed=true|false

  • taskPartsCount={N} (only when distributed=true)

  • maintenanceOptions=skipMatchEvents (optional)
    All parameters and behaviors are defined in the task reference.

Example (quiet maintenance on one entity type)

POST {ApplicationURL}/api/{tenant_id}/rebuildmatchtable\
?entityType=configuration/entityTypes/Organization\
&distributed=true\
&taskPartsCount=8\
&maintenanceOptions=skipMatchEvents

(Replace {tenant_id}, {ApplicationURL}, and Organization as needed.)

Filtering scope (advanced)
When you only need to rebuild a subset, use tokenization/rebuild tasks with query filters; understand that filtered entities can still match outside the filter.

 

Monitoring & control

  • Track status via Tasks API: SCHEDULED, PROCESSING, PAUSING, PAUSED, CANCELING, etc. Low objects/sec for long periods or frequent flips back to SCHEDULED suggest environment contention.

  • Get active tasks to see competing work:

    GET {ApplicationURL}/api/{tenant_id}/tasks

    (Returns tasks across tenants with active statuses.)

  • Get specific rebuing match tasks

    GET {ApplicationURL}/api/{tenant_id}/tasks/<task Id>

  • Pause/Stop is supported on rebuild and related tasks; use it if you observe instability or need to de-risk peak hours.

Post-run steps

  • If analysis indicated token collision issues, confirm that MATCH tables look healthy after the rebuild. (Collision cleanup requires rebuild.)

  • Optionally run any post-rebuild validation provided in the task documentation (where applicable) to double-check completeness.

Common pitfalls & how to avoid them

  • Starting many large rebuilds across tenants at once → Stagger start times and keep parts conservative; scale up only if nodes are free and throughput is steady.

  • Inefficient rules make rebuilds run for days → Always run Analyze Match Strategy and fix token-heavy or fuzzy rules before kicking off.

  • Unexpected downstream churn → Use maintenanceOptions=skipMatchEvents for quiet maintenance windows.

FAQ

Q: Can I rebuild multiple entity types at once?
Yes, but for large volumes it’s safer (and easier to monitor) to run per entity type—e.g., Organization first, then Location—especially when using distributed mode.

Q: How do I choose taskPartsCount?
Start conservatively relative to your tenant’s capacity; monitor throughput and task stability, then adjust. Parallelism only helps if API nodes are available.

Q: What about Rebuild Match Table v2 and follow-up processing?
Review the v2 page for version-specific behavior and the ProcessRebuiltMatchTask if you plan to process v2 results and emit events accordingly.

 

Related documentation

Was this article helpful?
0 out of 0 found this helpful

Comments

0 comments

Please sign in to leave a comment.