What is causing the mismatch where the result of the merge and the endobject URI is not matching between Reltio and Snowflake?

Question

We are executing the following query in Snowflake.

SELECT uri,ENDOBJECT:objectURI ::string FROM  "PSEDW_MDM_TEST"."CUSTOMER_MAIN_UAT2"."RELATIONS" 
  WHERE TYPE = 'configuration/relationTypes/AssetToLocation' 
  and uri ='relations/3r8g6WX';

In the above relation in Reltio, the end object entity URI is 'entities/0zrLNtK', whereas, in Snowflake, it is 'entities/0AUXkzK'.

We also ran the "syncToDataPipeline" job. However, this did not fix the issue.

If you see historical data, you will see the issue, and you can see below if I searched for data 3 days back, you can find the end object URI as '0AUXkzK'.

Answer

The problem is that the Merges table within Snowflake doesn't contain the needed information as the LINKS table does but has yet to be instantiated in the Snowflake environment. Below is a series of steps to create the LINKS table in Snowflake and synchronize it to Reltio.

Merges Table Schema (Current and Historical state of Merges)

Field name

Type

Mode

Description

mergeKey

VARCHAR

NULLABLE

This is the same as the loserId.

winnerId

VARCHAR

NULLABLE

The entity ID of the winner record in the merge.

loserId

VARCHAR

NULLABLE

The entity ID of the loser record in the merge.

matchRules

STRING

REPEATED

List of the match rules that played a role in matching these records before they got merged. This column hold the label of the match rules.

timestamp

NUMBER(13,0)

NULLABLE

The timestamp of the event that occurred in Reltio System.

type

VARCHAR

NULLABLE

The type of event that occurred.

Below are the possible values of type.

  • MANUAL - When two entities are merged manually by an user.

  • AUTO - When two entities are merged Automatically by merge process using an Automatic match rule

  • ON_THE_FLY - When an newly created entity is merged on the fly using some Automatic match rules.

  • GROUP_MERGE - refers to the situation when entities are merged with each other in two different merge processes simultaneously.

mergeRulesUris

VARIANT

REPEATED

The URI of the merge rules. The configuration/entityTypes/Organization/matchGroups/FuzzyNameExactAddress URI corresponds to the match rule defined in your L3 under the Organization entity type.

active

BOOLEAN

NULLABLE

Boolean value indicating whether record is active or not.

directWinner

VARCHAR

NULLABLE

The entity ID of the direct winner record in the merge.

Field name

Type

Mode

Description

winnerId

VARCHAR

NULLABLE

The entity ID of the winner record in the merge.

loserId

VARCHAR

NULLABLE

The entity ID of the loser record in the merge.

timestamp

NUMBER(13,0)

NULLABLE

The timestamp of the event that occurred in Reltio System.

active

BOOLEAN

NULLABLE

Boolean value indicating whether record is active or not.

Steps to create and synchronize the LINKS table in Snowflake.  for more information about each step please select the link.

  1. Create Links Table in Snowflake
CREATE TABLE "<database_name>"."<schema_name>"."links"
("winnerId" VARCHAR, "loserId" VARCHAR, "timestamp" NUMBER(13,0), "active" BOOLEAN)
CLUSTER BY ("active");
  1. Create Links View in Snowflake
CREATE VIEW "<database_name>"."<schema_name>"."activeLinks"
AS SELECT *
FROM "links"
WHERE "active" = TRUE;
  1. Create Links Stream in Snowflake
CREATE STREAM "<database_name>"."<schema_name>"."linksStream"
ON TABLE "<database_name>"."<schema_name>"."<landingtable_name>"
APPEND_ONLY = true;
  1. Create Links Processing Task in Snowflake
CREATE TASK "<database_name>"."<schema_name>"."linksTask"
 WAREHOUSE = <warehouse_name>
 SCHEDULE = '1 minute'
WHEN
 SYSTEM$STREAM_HAS_DATA('"<database_name>"."<schema_name>"."linksStream"')
AS
 MERGE INTO "<database_name>"."<schema_name>"."links" as target USING
      (SELECT staging."json":winnerId as winnerId, staging."json":loserId as loserId, staging."json":timestamp as "merge_timestamp", staging."json", staging."deleted"
      FROM "<database_name>"."<schema_name>"."linksStream" as staging,
      (SELECT "json":loserId as loserId, MAX ("json":timestamp) as "merge_timestamp"
         FROM "<database_name>"."<schema_name>"."linksStream"
         where "objectType" = 'links'
         GROUP BY loserId) as latest
       WHERE staging."json":loserId = latest.loserId AND staging."json":timestamp = latest."merge_timestamp" AND staging."objectType" = 'links'
       QUALIFY ROW_NUMBER() OVER (PARTITION BY staging."json":loserId ORDER BY staging."json":timestamp DESC) = 1) as staging_dedup
   on target."loserId" = staging_dedup.loserId
   WHEN MATCHED AND staging_dedup."deleted" = FALSE AND staging_dedup."merge_timestamp" > target."timestamp" THEN UPDATE SET target."timestamp" = staging_dedup."merge_timestamp", target."winnerId" = staging_dedup.winnerId, target."active" = TRUE
   WHEN MATCHED AND staging_dedup."deleted" = TRUE AND staging_dedup."merge_timestamp" > target."timestamp" THEN UPDATE SET target."timestamp" = staging_dedup."merge_timestamp", target."winnerId" = NULL, target."active" = FALSE
   WHEN NOT MATCHED AND staging_dedup."deleted" = FALSE THEN INSERT ("timestamp", "winnerId", "loserId", "active") values (staging_dedup."merge_timestamp", staging_dedup.winnerId, staging_dedup.loserId, TRUE)
   WHEN NOT MATCHED AND staging_dedup."deleted" = TRUE THEN INSERT ("timestamp", "loserId", "active") values (staging_dedup."merge_timestamp",  staging_dedup.loserId, FALSE);
  1. Start Links Processing Task in Snowflake
ALTER TASK "<database_name>"."<schema_name>"."linksTask" RESUME;
  1. Synchronize Links Data using Tenant Data Synchronization API (or create a Reltio Support ticket to kick off this task for you).

IMPORTANT: Specify the dataTypes query parameter to avoid re-syncing all tenant data and just processing links.

For example: 

POST {{domain}}/reltio/api/{{tenantID}}/syncToDataPipeline?dataTypes=merges
  1. Monitor Task completion using either of the APIs below.
GET {{domain}}/reltio/{{tenantID}}/tasks 
GET {{domain}}/reltio/{{tenantID}}/tasks/history
  1. Monitor Queue processing using.
GET {{dataPipelineHost}}/status/tenant/{{tenantID}}/details
  1. Verify data appears in the Snowflake Links table.

 

 

 

Was this article helpful?
0 out of 0 found this helpful

Comments

0 comments

Please sign in to leave a comment.