How to manage Reindex and Streaming of events due to Reindexing?


After triggering a re-indexing job for a tenant we receive a huge stream of data from SQS (Event Streaming API). Is there any way to prevent the re-indexing events from being processed in the streaming?


There are four options on how to use Reindex and Streaming of events due to Reindexing - 
1) Run reindex without match and merge. This will reduce (or eliminate) the number of events you get in streaming.
2) Disable streaming on the tenant for the duration of the reindex. Keep in mind that this will disable all events, not just those related to changes from the match/merge job.
3) Run reindex with updateEntities=false. In this case, you will not see the events in streaming.
4) For reindexing a periodic task when creating a data task, run the optional parameter forceIgnoreStreaming:
POST /reltio/{tenantId}/reindex?forceIgnoreInStreaming='true or false'

  • If set to true, messages are not sent to AMQ/SQS queues.
  • If set to false (default), all messages are processed by AMQ/SQS queues.

Q: If we disable streaming in the tenant config during reindexing (option 2), will it provide all those events after we enable streaming?
A: Events processed by API during the time while streaming is disabled will never appear in streaming. 

Q: Could you explain updateEntities=false (option 3) in more detail?
A: The ReindexDataTask works in the following way: it generates an ENTITY_CHANGED event for each entity. Then these events are processed by internal processors inside API, one of them re-indexes the entities. If updateEntities=true (default), then the events will be processed by all event processors, including the event streaming processor. If updateEntities=false, then only the first event processor will process events.

Note: These event processors are available, and every processor normally processes every event:

  • Indexing data to Elastic search to enable searching for actual entity data
  • Putting data changes into history
  • Update match tables for changed/created/deleted objects so that matching is able to work with objects
  • Stream same events to external streaming API (customer queues)
  • Analytics (RI) layer population to make objects accessible from RI

Note: If updateEntities=false, matching, and merging will not occur during re-indexing. You will need to run matching and merging as a separate task in this case.



Was this article helpful?
1 out of 1 found this helpful



Article is closed for comments.