What race conditions can lead to crosswalk duplication?

Question:

We loaded data and the upload body consisted of a mix of different entity and relation types. Only a single event from the request body didn't get fully applied; the rest of the events got applied successfully.

As for the single event that didn't get fully applied, it contained multiple attributes but only some were applied to the entity as expected. 

  • FirstName - successful
  • LastName - successful
  • Brand - failed
  • CustomerSince - failed
  • LanguagePreference - successful

The response from the Reltio API indicated that all of the uploaded events were successful, and when I attempted to replay this data back to our tenant, the missing fields suddenly appeared. At times we make concurrent updates to the entity, but we never change the same data provider crosswalk in both calls at once.

For example:

  • Update/call #1 contains only the top-level crosswalk, and this crosswalk is listed as the data provider.
  • Update/call #2 contains both a top-level crosswalk and a sub-crosswalk.  The top-level crosswalk is listed only as a contributor provider (not a data provider), and the sub-crosswalk is listed as the data provider.

Answer: 

Simultaneous (concurrent) requests can contribute to this problem. Reltio operates with eventual consistency, and entity updates are not always instantaneously reflected across all nodes. When multiple requests hit the same entity at the same time (or within a very short window), the following can happen:
 

  1. Two or more requests try to update the same entity simultaneously.
  2. Each request includes crosswalk data (often from the same or different systems).
  3. Because the entity isn't updated fast enough for the second request to "see" what the first one did:
    • Reltio may not detect a duplicate crosswalk,
    • It adds another crosswalk instead.
  4. Over time, this causes many redundant or unnecessary crosswalks, especially if these requests come frequently. Those extra crosswalks can contribute to an entity having too many. 

The pattern described above is shown below:

 

Call Top-level Crosswalk Sub-Crosswalk Data Provider Contributor
1 Present None Top-level
2 Present Present Sub-crosswalk Top-level

 

Why this still risks race conditions and bloat:

1. Reltio's conflict resolution is not crosswalk-aware at sub-object granularity.

Even if the updates "logically" affect different parts of the data (like nested vs. top-level), Reltio:

  • Evaluate the full entity.
  • Applies survivorship rules after ingest.
  • Commits new crosswalks for any data change from a "new perspective" (i.e., any crosswalk not already registered or merged).

 

2. If requests are simultaneous, the second one may not see the state of the first.

This causes:

  • Reltio treats both top-level crosswalks as independent contributors (even if identical).
  • A new crosswalk is to be added for the same logical source, because versioning or metadata differs slightly.

     

3. Crosswalks are based on unique identifiers (URIs, sources, timestamps, etc.).

Even a slight difference in structure like:

  • sub-crosswalk vs top-level crosswalk,
  • or differences in sourceTablesourceTypeuri, etc....can cause Reltio to treat them as distinct and result in additional crosswalks, even if data is conceptually the same.

 

4. Crosswalk Contributor vs Data Provider logic doesn't isolate updates.

The assumption that making one the data provider and another a contributor avoids race issues is unfortunately not guaranteed. The separation is respected during survivorship, not during ingestion or merge processing.

 


References:

We reguarly update our  reeference documents which may cause old links to break. Kindly search with the article title as refrenced below 

Crosswalks

Entity Crosswalk Consistency Task 

Bulk Update of Attributes

Cumulative Entity Update

Was this article helpful?
0 out of 0 found this helpful

Comments

0 comments

Please sign in to leave a comment.