Question:
We loaded data and the upload body consisted of a mix of different entity and relation types. Only a single event from the request body didn't get fully applied; the rest of the events got applied successfully.
As for the single event that didn't get fully applied, it contained multiple attributes but only some were applied to the entity as expected.
- FirstName - successful
- LastName - successful
- Brand - failed
- CustomerSince - failed
- LanguagePreference - successful
The response from the Reltio API indicated that all of the uploaded events were successful, and when I attempted to replay this data back to our tenant, the missing fields suddenly appeared. At times we make concurrent updates to the entity, but we never change the same data provider crosswalk in both calls at once.
For example:
- Update/call #1 contains only the top-level crosswalk, and this crosswalk is listed as the data provider.
- Update/call #2 contains both a top-level crosswalk and a sub-crosswalk. The top-level crosswalk is listed only as a contributor provider (not a data provider), and the sub-crosswalk is listed as the data provider.
Answer:
Simultaneous (concurrent) requests can contribute to this problem. Reltio operates with eventual consistency, and entity updates are not always instantaneously reflected across all nodes. When multiple requests hit the same entity at the same time (or within a very short window), the following can happen:
- Two or more requests try to update the same entity simultaneously.
- Each request includes crosswalk data (often from the same or different systems).
- Because the entity isn't updated fast enough for the second request to "see" what the first one did:
- Reltio may not detect a duplicate crosswalk,
- It adds another crosswalk instead.
- Over time, this causes many redundant or unnecessary crosswalks, especially if these requests come frequently. Those extra crosswalks can contribute to an entity having too many.
The pattern described above is shown below:
Call | Top-level Crosswalk | Sub-Crosswalk | Data Provider | Contributor |
---|---|---|---|---|
1 | Present | None | Top-level | — |
2 | Present | Present | Sub-crosswalk | Top-level |
Why this still risks race conditions and bloat:
1. Reltio's conflict resolution is not crosswalk-aware at sub-object granularity.
Even if the updates "logically" affect different parts of the data (like nested vs. top-level), Reltio:
- Evaluate the full entity.
- Applies survivorship rules after ingest.
- Commits new crosswalks for any data change from a "new perspective" (i.e., any crosswalk not already registered or merged).
2. If requests are simultaneous, the second one may not see the state of the first.
This causes:
- Reltio treats both top-level crosswalks as independent contributors (even if identical).
-
A new crosswalk is to be added for the same logical source, because versioning or metadata differs slightly.
3. Crosswalks are based on unique identifiers (URIs, sources, timestamps, etc.).
Even a slight difference in structure like:
-
sub-crosswalk
vstop-level crosswalk
, - or differences in
sourceTable
,sourceType
,uri
, etc....can cause Reltio to treat them as distinct and result in additional crosswalks, even if data is conceptually the same.
4. Crosswalk Contributor vs Data Provider logic doesn't isolate updates.
The assumption that making one the data provider and another a contributor avoids race issues is unfortunately not guaranteed. The separation is respected during survivorship, not during ingestion or merge processing.
References:
We reguarly update our reeference documents which may cause old links to break. Kindly search with the article title as refrenced below
Comments
Please sign in to leave a comment.