Overview
When profiles are merged in Reltio, the lineage is captured downstream in DPH/Databricks via merges and links tables. When profiles are unmerged (split), the event is captured in the activities table (via an items array containing an event type such as ENTITIES_SPLITTED).
Because DPH streams events (CREATE/UPDATE/MERGE/DELETE, etc.), you don’t typically get a single “this entity is unmerged” flag. Instead, you detect “unmerged/split activity” by querying activities for split events and then cross-checking lineage in merges/links.
What you can and can’t “see” in Databricks
Merged entities
You can identify merges directly from the merges dataset (winner/loser, event timestamp, ingest time, etc.).
“Unmerged entities”
You typically can’t list “unmerged entities” as a direct event state in DPH, because “unmerged” is not a persistent marker—DPH is event-driven. In practice:
An “unmerged entity” is usually interpreted as:
not deleted, and
not currently represented as a loser in the merge lineage, and/or
appears in split/unmerge activity events (if you’re specifically tracking split operations)
So you detect split/unmerge via activities, then validate lineage via links/merges.
Datasets involved
activities: Contains anitemsarray; each item haseventType,objectUri, anddata(payload).merges: Merge lineage and timestamps (winnerId,loserId,timestamp,insertedTime, and possiblydeleted).links: Canonical mapping of loser → winner across time; includesdeletedstate for link records.entities: Entity lifecycle state, includingdeletedandendDate(soft delete time).
Query 1 — Check merge lineage and ingest lag (merged entities)
Use this when you have entity IDs and want to inspect merge events and how quickly they arrived in Databricks.
WITH merged AS (
SELECT
winnerId,
loserId,
to_timestamp(timestamp/1000) AS merge_event_ts,
to_timestamp(insertedTime/1000) AS merge_ingest_ts
FROM <catalog>.<schema>.merges
WHERE winnerId IN ('entityId1', 'entityId2')
OR loserId IN ('entityId1', 'entityId2')
)
SELECT
CASE
WHEN winnerId IN ('entityId1', 'entityId2') THEN winnerId
ELSE loserId
END AS entity_uri,
merge_event_ts,
merge_ingest_ts,
(merge_ingest_ts - merge_event_ts) AS merge_ingest_lag
FROM merged
ORDER BY merge_ingest_ts;
Replace:
<catalog>.<schema>with your Databricks catalog/schemaentityId1,entityId2with real entity IDs
Query 2 — List split/unmerge events directly from Activities
This is the primary way to identify split/unmerge activity.
SELECT
a.activityTimestamp,
a.activityUser,
i.eventType,
i.objectUri AS affectedObjectUri,
i.data AS eventPayload
FROM <catalog>.<schema>.activities a
LATERAL VIEW explode(a.items) AS i
WHERE i.eventType = 'ENTITIES_SPLITTED'
ORDER BY a.activityTimestamp DESC;
What this gives you
Who performed the action (
activityUser)When it occurred (
activityTimestamp)Which object(s) were affected (
objectUri)Payload details (
data)
Query 3 — See which entity URIs were involved in recent splits
Useful when you want a compact list of split events with the raw object references.
WITH splits AS (
SELECT a.activityTimestamp, i.objectUri, i.data
FROM <catalog>.<schema>.activities a
LATERAL VIEW explode(a.items) AS i
WHERE i.eventType = 'ENTITIES_SPLITTED'
)
SELECT *
FROM splits
ORDER BY activityTimestamp DESC;
Query 4 — Cross-check lineage around split timestamps using merges
Once you have split-related objectUris, cross-check whether those entities appear in merge history.
SELECT m.timestamp, m.winnerId, m.loserId, m.type, m.deleted
FROM <catalog>.<schema>.merges m
WHERE m.winnerId IN (
SELECT regexp_extract(i.objectUri, 'entities/([^"]+)', 1)
FROM <catalog>.<schema>.activities a
LATERAL VIEW explode(a.items) AS i
WHERE i.eventType = 'ENTITIES_SPLITTED'
)
OR m.loserId IN (
SELECT regexp_extract(i.objectUri, 'entities/([^"]+)', 1)
FROM <catalog>.<schema>.activities a
LATERAL VIEW explode(a.items) AS i
WHERE i.eventType = 'ENTITIES_SPLITTED'
)
ORDER BY m.timestamp DESC;
Validation queries for confirming “split state” and distinguishing from deletes
Inspect the LINKS history during a suspected unmerge window
What to look for: a prior loser→winner mapping later marked as deleted = true.
SELECT
winnerId, loserId, deleted, version, timestamp, insertedTime
FROM <catalog>.<schema>.links
WHERE (winnerId IN ('<known_winner_id>') OR loserId IN ('<known_loser_id>'))
ORDER BY timestamp;
Inspect MERGES rows around the same entities
What to look for: whether historical merge rows later show deleted = true or whether new rows appear.
SELECT *
FROM <catalog>.<schema>.merges
WHERE winnerId = '<known_winner_id>' OR loserId = '<known_loser_id>'
ORDER BY timestamp;
Confirm entity soft-delete vs unmerge (Entities table)
Rule of thumb: unmerge should not soft-delete entities.
Soft delete shows deleted=true and an endDate.
SELECT uri, deleted, startDate, endDate, commitTime
FROM <catalog>.<schema>.entities
WHERE uri IN ('entities/<id_A>', 'entities/<id_B>');
Troubleshooting checklist
No split events found? Confirm the correct
eventTypevalue used in your pipeline (ENTITIES_SPLITTEDvs other naming variants).Downstream tables missing deletes/loser updates? Ensure your pipeline is configured to propagate removed records (commonly
reindexDeleted=truein relevant connector settings).Conflicting lineage results? Treat
linksas the canonical mapping history for winner resolution, and join the downstream relationship datasets tolinksto resolve the current winners.
Field semantics to know
Entities table
deleted(boolean): indicates if the entity is deletedendDate: time the entity was soft deleted (if deleted)
Links table
deleted(boolean): indicates whether the lineage mapping row is deletedLinks represent the canonical loser → winner mapping history
Merges table
deleted(boolean): indicates whether the entity is deleted (as represented in merge dataset semantics)
Comments
Please sign in to leave a comment.