Why wasn't the merge information passed downstream to my Data Lake?

Question

We are looking to gain more understanding as to why we are missing some Reltio events. We have instances where the Reltio ID is not stamping onto our returned data sets. We have an example below but were wondering if we could set up a call with Reltio once our question is understood more in order to discuss the issue further and give more examples if needed. Please let me know if/when would work for Reltio to jump on a call to discuss.

For source record with sourceid ‘0502575900000_002346490_0000_20100106' (from ingestion id 1003) which was ingested to Reltio on 12/11/2021 9:45 AM, we are seeing a Reltio ID ‘jo9o9we’ in the Reltio UI at 12/11/2021 10:15:51 AM. But this record never made its way back to the Data Lake till 12/14/2021 09:30:14 AM when it got replaced by a new ingestion 1019. 

Answer

  • Reviewing the logs (LogDNA), we can see that the object for jo9o9we was too large to be passed to the customer data lake.

Example of message in the log.

 dataprocess-bbb7576c6-4lsj2 dataload ERROR Failed to merge winner: 1DVUxMcb with loser(s): CYypr5N

com.reltio.metadata.errors.CommonException: Code: OBJECT_TOO_LARGE_TO_SAVE_IT; Message parameters: [entities/1DVUxMcb]. Object too large to save it entities/1DVUxMcb. Object is too large entities/1DVUxMcb. Attribute values limit exceeded for the whole object. Limit: 400, current value: 407
  • The recommendation is to make the following change to your tenant's physical configuration.
"streamingConfig": {
"streamingEnabled": true,
"streamingAPIEnabled": true,
"analyzeOvChanges": false,
"emptyStartEndRelationCrosswalks": false,
"largeObjectsSupport": true,

When the compressed size of the event exceeds the limit, the event will be sent again to include the URI of the object.

Example of JSON:

{
"type": <string>,
"object": <serialized ObjectTO (e.g. EntityTO/RelationTO)>,
"ovChanged": <true|false> //only for ENTITY_CHANGED and RELATIONSHIP_CHANGED, optional - may not be present when "analyzeOvChanges" streaming property is set to false,
"exceededQueueSizeLimit": <true|false> // populated when TRUE
}

If event size with compressed payload exceeds the specified limit, then, only thesourceObjectUri is included in the payload. The client application then makes an additional request to Reltio to get the object content, by using the GET /api//entities/object_id request.

To get the entity Id correctly, the entityId is in the headers rather than in the content of the body when "largeObjectSupport " property is turned on.  

 This can be accessed using the following.

 aws sqs get-queue-attributes --queue-url <value> [--attribute-names <value>] [--cli-input-json | --cli-input-yaml] [--generate-cli-skeleton <value>]


Please notice the following paragraph in the reference - https://docs.reltio.com/events/messagestreamingcrud.html

Note: Make sure to use the objectVersion header to order the events of the same object instead of using fields that have timestamps, such as createdTime, updatedTime or any fields that originate from the crosswalk. The messages contain the header only if it is included in the JMSEventsFilteringFields property based on the message streaming configuration. The header value is an incremental number, which is updated after every change in the business object. So, the message with the highest objectVersion header value will contain the latest object state.

 

 

 

 

Was this article helpful?
0 out of 0 found this helpful

Comments

0 comments

Please sign in to leave a comment.