Will the correct winner be applied to the relationship table in a DPH connector?

Question

I can not find the correct winner for the start or end objects in the relations table. Do you know if this is expected?

 

Answer

 

DPH events are not explicitly created to update the relationship table in the connectors to ensure the correct winner entity ID. However, various events and processes within the Data Pipeline Hub (DPH) and connectors impact data synchronization and entity updates. The relationship table contains information about relationships between entities. However, to accurately identify the current winner entities, the customer must join the relationship table with the links table. This is the primary purpose of the links table. The links table records mappings from all original (loser) entity URIs to their corresponding winner entity URI. It preserves a history of mergers and transformations. To determine the correct winner/loser for a merge operation, you must query the merge table and the relations table.

  • If entity A is related to entity B, the table will record the relationship between A and B.
  • Subsequently, if B is merged with some other entity, C, this won’t update the relation table in the connector, as we don’t have any events from the platform. To gather that information, you must cross-join the LINKS table with the RELATIONS table.
  • So, though B does not exist in the system, the relation between A and B will exist in data bricks, and this needs to be resolved using the merge table.

Relationship Table: Stores details about relationships, including startObject and endObject URIs, which reference the entities involved in the relationship. (See: https://docs.reltio.com/en/applications/data-integrations/data-pipelines-at-a-glance/reltio-data-pipeline-for-snowflake-at-a-glance/reltio-data-pipeline-for-snowflake-set-up/datasets-for-the-snowflake-data-schema/relationship-dataset-for-snowflake?utm_source=chatgpt.com)


Links Table: Maintains mappings between entity URIs and their corresponding winner entity URIs, effectively tracking the latest surviving entities after merges. 

Links Table Schema:

 

Example of crosswalk using LINKS and RELATIONS

 

Please use the example below as a starting point for your investigation.
 

create or replace view RELTIO.ACTIVE_RELATIONS(
"uri",
STARTOBJECT,
ENDOBJECT,
"version",
"timestamp",
"type",
"attributes",
"crosswalks",
"startRefPinned",
"startRefIgnored",
"endRefPinned",
"endRefIgnored",
"createdBy",
"createdTime",
"updatedBy",
"updatedTime",
"commitTime",
"startDate",
"endDate",
"active"
) as select * from ( WITH RelationData AS (
SELECT
r."uri",
SPLIT_PART(r."startObject":objectURI::VARCHAR, '/', 2) AS start_object_id,
SPLIT_PART(r."endObject":objectURI::VARCHAR, '/', 2) AS end_object_id,
r."version",
r."timestamp",
r."type",
r."attributes",
r."crosswalks",
r."startRefPinned",
r."startRefIgnored",
r."endRefPinned",
r."endRefIgnored",
r."createdBy",
r."createdTime",
r."updatedBy",
r."updatedTime",
r."commitTime",
r."startDate",
r."endDate",
r."active"
FROM RELTIO.RELATIONS r
WHERE r."active" = 'TRUE'
)

SELECT
rd."uri",
COALESCE(so."winnerId", rd.start_object_id) AS startobject,
COALESCE(eo."winnerId", rd.end_object_id) AS endobject,
rd."version",
rd."timestamp",
rd."type",
rd."attributes",
rd."crosswalks",
rd."startRefPinned",
rd."startRefIgnored",
rd."endRefPinned",
rd."endRefIgnored",
rd."createdBy",
rd."createdTime",
rd."updatedBy",
rd."updatedTime",
rd."commitTime",
rd."startDate",
rd."endDate",
rd."active"
FROM RelationData rd
LEFT JOIN RELTIO.ACTIVELINKS so
ON rd.start_object_id = so."loserId"
LEFT JOIN RELTIO.ACTIVELINKS eo
ON rd.end_object_id = eo."loserId");

 
Explanation of the query
 
This query extracts active relations from the "RELATIONS" table, processes them, and resolves object references using the "LINKS" table.  Here they are joining the relations stage table to the Links table, where they are only considering the rows from the "LINK" table where active = true, 
 

Step 1: Extract Active Relations

  • The WITH clause (RelationData) selects all records from RELTIO.RELATIONS where "active" = 'TRUE'.
  • It extracts:
    • Uri: The relation's unique identifier.
    • startObject and endObject: Extracts object IDs from a URI field using SPLIT_PART(r."startObject":objectURI::VARCHAR, '/', 2).

Step 2: Resolve Object References Using ACTIVELINKS

  • The RelationData result is joined with the ACTIVELINKS table twice:
    • Once on start_object_id, replace it with winnerId (if available).
    • Once on end_object_id, replace it with winnerId (if available).
  • This is done using LEFT JOIN, meaning if no match is found, the original object ID is used (COALESCE(so."winnerId", rd.start_object_id)).

Step 3: Final Selection

  • The final selection includes:
    • The original URI.
    • The resolved startObject and endObject.
    • The metadata fields from RelationData.

 

 

Was this article helpful?
0 out of 0 found this helpful

Comments

0 comments

Please sign in to leave a comment.