Overview
Some customers observe very large numbers of ENTITY_CHANGED events flowing into Data Pipeline Hub (DPH) and downstream platforms (for example, Databricks Delta Lake). This article explains why that can happen—even when the entity “looks the same”—and what configuration options can help reduce event scope or identify meaningful changes.
Symptoms
You may notice one or more of the following:
Extremely high counts of
ENTITY_CHANGEDevents (potentially millions+).Downstream processing treats many events as “changes,” even when business attributes appear unchanged.
Events contain the full entity payload (post-state snapshot), but you can’t easily tell what changed.
Bursty ingestion or backlogs in downstream processing during peak event spikes.
What’s Happening
ENTITY_CHANGED Events are snapshot-style, not diff-style
Reltio streaming ENTITY_CHANGED Events typically include:
The full post-state of the entity (
body.object)Event metadata such as
operationId,commitTime, andobjectVersion
What they do not include:
A “diff” (before vs after) showing exactly which attributes changed
Implication:
If a process touches an entity, re-saves it, bumps the version, or triggers cascading updates, downstream systems will receive an ENTITY_CHANGED event and treat it as a change—even if the business payload is effectively the same.
Why This Can Lead to Massive Event Counts
High event volume is often caused by platform-expected behaviors such as:
Workflow actions or integrations that re-save entities
Bulk processes that update metadata or trigger version increments
Reference attribute changes that cascade updates to related entities
Reprocessing/replay scenarios that re-emit events
This is not necessarily a DPH defect. In many cases, DPH is simply delivering the events it receives.
What to Check in Your DPH Configuration
In many configurations, customers have settings like:
DPH enabled and running (not paused)
dataFilteringEnabled: falseovOnly: false
Implication:
DPH does not filter types/attributes on its own, and it is not restricted to operational values. This generally means DPH will forward a broad set of events and payload content.
Options to Reduce Noise (and Their Limitations)
Option A: Enable OV-change analysis (adds ovChanged indicator)
If your goal is “only meaningful changes,” one pragmatic approach is to enable OV-change analysis so events can include an ovChanged boolean.
Benefit
Helps downstream logic quickly identify when operational values (OV) changed.
Important limitation
For scenarios involving reference attributes,
ovChangedmay betruefor related entities due to optimization behavior. That meansovChanged=trueis not a perfect proxy for “material business change,” especially with reference cascades.
Option B: Tighten streaming destination filters (typeFilter, objectFilter)
Streaming destinations can often be scoped with:
typeFilter: restrict which event types are emittedobjectFilter: restrict events based on filter expressions (search-filter syntax)
Benefit
Reduces overall event volume by excluding entire classes of events or non-target entities.
Key limitations
Filtering only works on fields that are present in the event payload.
objectFilterreduces scope, but does not inherently detect “no material change” unless your filter can explicitly represent the desired semantics.
Option C: Enable DPH data filtering (dataFilteringEnabled) and configure L3 dataPipelineConfig
If you enable dataFilteringEnabled, DPH can restrict which types/attributes are sent downstream based on the model (L3) configuration.
Benefit
Strong scope control: exclude entire types and/or allowlist attributes per type.
Reduces payload size and downstream processing complexity.
How it works (high level)
When enabled, only types with a
dataPipelineConfigsection are included.Attributes can be allowlisted.
Example L3 fragment (allowlist attributes)
{
"name": "Individual",
"type": "ENTITY",
"dataPipelineConfig": {
"enabled": true,
"attributes": ["FirstName", "LastName", "Email"]
}
}Key limitation
This primarily controls what content is sent, not necessarily whether an event is produced. If the platform emits
ENTITY_CHANGED, it can still be emitted—but the payload scope is reduced and downstream work can be minimized.
Option D: Set ovOnly = true (reduce payload content)
Setting ovOnly may restrict payload content to operational values.
Benefit
Smaller payloads and potentially simpler downstream processing.
Limitation
This setting generally affects payload content, not guaranteed event count.
Practical Guidance: Choosing the Right Approach
You want fewer events overall:
Start with destination scoping (typeFilter,objectFilter) and ensure you are not emitting event types you don’t consume.You want to keep events but reduce downstream cost:
Enable DPH data filtering to reduce type/attribute scope, and considerovOnly.You want to programmatically distinguish meaningful updates:
Enable OV-change analysis to useovChangedas a signal (with the reference-cascade limitation noted above).You need true “diffs”:
Reltio streaming events are not different events by default. If you need strict before/after comparisons, implement downstream diff logic using previous-state storage (e.g., compare against the last processed version).
FAQ
Q: Why do I get ENTITY_CHANGED when the entity content appears unchanged?
A: Because the event is a post-state snapshot. If an entity is re-saved, version bumped, or updated indirectly (e.g., cascade), an ENTITY_CHANGED event may be emitted even if the business payload looks similar.
Q: Does DPH filter “only real changes” automatically?
A: Not by default. If dataFilteringEnabled is off and destination filters are broad, DPH forwards what it receives.
Q: Will ovOnly reduce the number of events?
A: It is primarily a payload-content control. Event volume is driven by what the streaming source emits.
Q: Is ovChanged a perfect indicator of meaningful change?
A: No. It’s helpful, but reference attribute scenarios can cause ovChanged=true for related entities.
Summary
High ENTITY_CHANGED volume is most commonly explained by snapshot-style change events and platform behaviors that legitimately bump versions or trigger cascades. To reduce noise or cost:
Scope events at the streaming destination (
typeFilter,objectFilter)Enable OV-change analysis (
ovChanged) to classify events downstreamEnable DPH filtering (
dataFilteringEnabled) + L3dataPipelineConfigto reduce types/attributesConsider
ovOnlyto reduce payload size
Comments
Please sign in to leave a comment.