Question
We are executing the following query in Snowflake.
SELECT uri,ENDOBJECT:objectURI ::string FROM "PSEDW_MDM_TEST"."CUSTOMER_MAIN_UAT2"."RELATIONS"
WHERE TYPE = 'configuration/relationTypes/AssetToLocation'
and uri ='relations/3r8g6WX';
In the above relation in Reltio, the end object entity URI is 'entities/0zrLNtK', whereas, in Snowflake, it is 'entities/0AUXkzK'.
We also ran the "syncToDataPipeline" job. However, this did not fix the issue.
If you see historical data, you will see the issue, and you can see below if I searched for data 3 days back, you can find the end object URI as '0AUXkzK'.
Answer
The problem is that the Merges table within Snowflake doesn't contain the needed information as the LINKS table does but has yet to be instantiated in the Snowflake environment. Below is a series of steps to create the LINKS table in Snowflake and synchronize it to Reltio.
Merges Table Schema (Current and Historical state of Merges)
Field name |
Type |
Mode |
Description |
---|---|---|---|
mergeKey |
VARCHAR |
NULLABLE |
This is the same as the |
winnerId |
VARCHAR |
NULLABLE |
The entity ID of the winner record in the merge. |
loserId |
VARCHAR |
NULLABLE |
The entity ID of the loser record in the merge. |
matchRules |
STRING |
REPEATED |
List of the match rules that played a role in matching these records before they got merged. This column hold the label of the match rules. |
timestamp |
NUMBER(13,0) |
NULLABLE |
The timestamp of the event that occurred in Reltio System. |
type |
VARCHAR |
NULLABLE |
The type of event that occurred. Below are the possible values of type.
|
mergeRulesUris |
VARIANT |
REPEATED |
The URI of the merge rules. The |
active |
BOOLEAN |
NULLABLE |
Boolean value indicating whether record is active or not. |
directWinner |
VARCHAR |
NULLABLE |
The entity ID of the direct winner record in the merge. |
Links Table Schema (current state of merges only)
Field name |
Type |
Mode |
Description |
---|---|---|---|
winnerId |
VARCHAR |
NULLABLE |
The entity ID of the winner record in the merge. |
loserId |
VARCHAR |
NULLABLE |
The entity ID of the loser record in the merge. |
timestamp |
NUMBER(13,0) |
NULLABLE |
The timestamp of the event that occurred in Reltio System. |
active |
BOOLEAN |
NULLABLE |
Boolean value indicating whether record is active or not. |
Steps to create and synchronize the LINKS table in Snowflake. for more information about each step please select the link.
CREATE TABLE "<database_name>"."<schema_name>"."links"
("winnerId" VARCHAR, "loserId" VARCHAR, "timestamp" NUMBER(13,0), "active" BOOLEAN)
CLUSTER BY ("active");
CREATE VIEW "<database_name>"."<schema_name>"."activeLinks"
AS SELECT *
FROM "links"
WHERE "active" = TRUE;
CREATE STREAM "<database_name>"."<schema_name>"."linksStream"
ON TABLE "<database_name>"."<schema_name>"."<landingtable_name>"
APPEND_ONLY = true;
CREATE TASK "<database_name>"."<schema_name>"."linksTask"
WAREHOUSE = <warehouse_name>
SCHEDULE = '1 minute'
WHEN
SYSTEM$STREAM_HAS_DATA('"<database_name>"."<schema_name>"."linksStream"')
AS
MERGE INTO "<database_name>"."<schema_name>"."links" as target USING
(SELECT staging."json":winnerId as winnerId, staging."json":loserId as loserId, staging."json":timestamp as "merge_timestamp", staging."json", staging."deleted"
FROM "<database_name>"."<schema_name>"."linksStream" as staging,
(SELECT "json":loserId as loserId, MAX ("json":timestamp) as "merge_timestamp"
FROM "<database_name>"."<schema_name>"."linksStream"
where "objectType" = 'links'
GROUP BY loserId) as latest
WHERE staging."json":loserId = latest.loserId AND staging."json":timestamp = latest."merge_timestamp" AND staging."objectType" = 'links'
QUALIFY ROW_NUMBER() OVER (PARTITION BY staging."json":loserId ORDER BY staging."json":timestamp DESC) = 1) as staging_dedup
on target."loserId" = staging_dedup.loserId
WHEN MATCHED AND staging_dedup."deleted" = FALSE AND staging_dedup."merge_timestamp" > target."timestamp" THEN UPDATE SET target."timestamp" = staging_dedup."merge_timestamp", target."winnerId" = staging_dedup.winnerId, target."active" = TRUE
WHEN MATCHED AND staging_dedup."deleted" = TRUE AND staging_dedup."merge_timestamp" > target."timestamp" THEN UPDATE SET target."timestamp" = staging_dedup."merge_timestamp", target."winnerId" = NULL, target."active" = FALSE
WHEN NOT MATCHED AND staging_dedup."deleted" = FALSE THEN INSERT ("timestamp", "winnerId", "loserId", "active") values (staging_dedup."merge_timestamp", staging_dedup.winnerId, staging_dedup.loserId, TRUE)
WHEN NOT MATCHED AND staging_dedup."deleted" = TRUE THEN INSERT ("timestamp", "loserId", "active") values (staging_dedup."merge_timestamp", staging_dedup.loserId, FALSE);
ALTER TASK "<database_name>"."<schema_name>"."linksTask" RESUME;
- Synchronize Links Data using Tenant Data Synchronization API (or create a Reltio Support ticket to kick off this task for you).
IMPORTANT: Specify the dataTypes query parameter to avoid re-syncing all tenant data and just processing links.
For example:
POST {{domain}}/reltio/api/{{tenantID}}/syncToDataPipeline?dataTypes=merges
- Monitor Task completion using either of the APIs below.
GET {{domain}}/reltio/{{tenantID}}/tasks
GET {{domain}}/reltio/{{tenantID}}/tasks/history
- Monitor Queue processing using.
GET {{dataPipelineHost}}/status/tenant/{{tenantID}}/details
- Verify data appears in the Snowflake Links table.
Comments
Please sign in to leave a comment.