Patterns and uses related to the DPH merge table

The design of the function is that the merges table is the audit trail of the merge activity. It stores both current and historical loser→winner mappings (including unmerges), so you can reconstruct what happened over time. The complementary links / activeLinks view is the current state only.

Merges Result.png

What the columns mean

  • mergeKey – same as loserId.
  • winnerId – the surviving entity at the time of that merge event.
  • loserId – the entity that lost the merge.
  • timestamp – when the merge/unmerge event occurred (epoch ms).
  • active – whether that loser→winner link is currently in effect. When the platform emits a “deleted” link (e.g., an unmerge), active becomes FALSE, and winnerId is set to NULL in the merges record

Interpreting your row
For loserId = 'MdaR6OK', you see winnerId = NULL and active = FALSE. That means this loser→winner link is no longer active—typically because it was unmerged (or otherwise superseded). The record remains for history in merges. 

 

How to use it

  • To find the current winner for a loser, query activeLinks (or build it as shown in Reltio’s doc) and filter loserId = 'MdaR6OK'. If there’s no row, that loser currently isn’t merged.
  • When fixing downstream data (e.g., relationships that still point to old IDs), join to activeLinks to resolve to current winners. 

 

What you’re seeing (“loserId” showing up inactive in merges) is expected once that loser→winner link is no longer the current state.

 


Why your row is active = FALSE

  • merges is an event journal, not just the current mapping. It stores every merge/unmerge change over time. When a merge is undone (or superseded), the platform emits a “deleted” link for that loser, and the corresponding record in merges is marked active = FALSE and may set winnerId = NULL. That preserves history, but indicates that the mapping is no longer in effect.
  • If you want the current winner→loser mapping only, use the links / activeLinks dataset (as I previously stated). The links table (as you have already summarized) represents the current state only, while merges keeps both current and historical states. 

 


Where to get governance metadata (manual vs auto, which rules, who did it)

There are two “right” sources, depending on what you have enabled:

Data Pipeline “merges” dataset (recommended for analytics)
This version of merges includes governance fields such as:

  • type → MANUAL, AUTO, ON_THE_FLY, GROUP_MERGE
  • matchRules and mergeRulesUris → which rule(s) drove the merge
  • active, directWinner, etc.
    If your table has only mergeKey/winnerId/loserId/active/timestamp (like in your screenshot), you’re likely querying an older egress table and won’t see those governance columns—so you can’t answer “manual vs auto” straight from that table. The Data Pipeline schema does include them.

If your table only has mergeKey, winnerId, loserId, active, timestamp, you’re likely querying an older/minimal egress version. The Data Pipeline merges dataset (above) includes the governance fields (type, matchRules, mergeRulesUris, directWinner). You can (re)sync merges via the pipeline if needed.

APIs for full audit detail (best for investigations)

  • The Entity Merge Tree API returns a chain of merges with timestamps, reasons (e.g., “merge on the fly”), and the user who executed a manual merge. Ideal when you need who/why for a specific record.
  • Events (Match/Merge) API streams merge events for operational auditing/dashboarding.

The observation that “the activity log is the most reliable way to get detail” is true if you’re on the lean egress merges. If you switch to the Data Pipeline merges dataset (or add it alongside), the table itself carries the governance fields you want.

 


Practical patterns that might help

 

Report only the currently effective merges (plus governance):

 

select m.* from activeLinks l
join merges m
  on m.loserId = l.loserId
 and m.active = true
 and m.timestamp = (
     select max(timestamp) from merges
     where loserId = l.loserId and active = true
 );

 

This avoids picking up the final “inactive” record for a loser and pairs your analytics columns with the current state. (If you don’t have activeLinks yet, Reltio documents how to build it from the landing stream.) 

 

Manual vs automatic counts (current state):

 

select m.type, count(*) as merges
from activeLinks l
join merges m
  on m.loserId = l.loserId
 and m.active = true
 and m.timestamp = (
     select max(timestamp) from merges
     where loserId = l.loserId and active = true
 )
group by 1 order by 2 desc;

 

Which match rules are driving merges (current state):

 

select r.value::string as matchRule, count(*) as cnt
from activeLinks l
join merges m
  on m.loserId = l.loserId
 and m.active = true
 and m.timestamp = (
     select max(timestamp) from merges
     where loserId = l.loserId and active = true
 ),
 lateral flatten(input => m.matchRules) r
group by 1 order by 2 desc;

 

Bottom line

  • Your “inactive loserId” row is normal—merges is recording that the link is no longer effective. Use links/activeLinks for the present state.
  • For governance analytics (manual vs auto, which rules), use the Data Pipeline merges dataset with type/matchRules/mergeRulesUris; if you don’t have that schema, rely on Merge Tree or Events for the detail. 

 

 

 

 

Was this article helpful?
0 out of 0 found this helpful

Comments

0 comments

Please sign in to leave a comment.