Why You May See High Volumes of ENTITY_CHANGED Events in DPH (and How to Reduce Noise)

Overview

Some customers observe very large numbers of ENTITY_CHANGED events flowing into Data Pipeline Hub (DPH) and downstream platforms (for example, Databricks Delta Lake). This article explains why that can happen—even when the entity “looks the same”—and what configuration options can help reduce event scope or identify meaningful changes.

 

Symptoms

You may notice one or more of the following:

  • Extremely high counts of ENTITY_CHANGED events (potentially millions+).

  • Downstream processing treats many events as “changes,” even when business attributes appear unchanged.

  • Events contain the full entity payload (post-state snapshot), but you can’t easily tell what changed.

  • Bursty ingestion or backlogs in downstream processing during peak event spikes.

 

What’s Happening

ENTITY_CHANGED Events are snapshot-style, not diff-style

Reltio streaming ENTITY_CHANGED Events typically include:

  • The full post-state of the entity (body.object)

  • Event metadata such as operationId, commitTime, and objectVersion

What they do not include:

  • A “diff” (before vs after) showing exactly which attributes changed

Implication:
If a process touches an entity, re-saves it, bumps the version, or triggers cascading updates, downstream systems will receive an ENTITY_CHANGED event and treat it as a change—even if the business payload is effectively the same.

 

Why This Can Lead to Massive Event Counts

High event volume is often caused by platform-expected behaviors such as:

  • Workflow actions or integrations that re-save entities

  • Bulk processes that update metadata or trigger version increments

  • Reference attribute changes that cascade updates to related entities

  • Reprocessing/replay scenarios that re-emit events

This is not necessarily a DPH defect. In many cases, DPH is simply delivering the events it receives.

 

What to Check in Your DPH Configuration

In many configurations, customers have settings like:

  • DPH enabled and running (not paused)

  • dataFilteringEnabled: false

  • ovOnly: false

Implication:
DPH does not filter types/attributes on its own, and it is not restricted to operational values. This generally means DPH will forward a broad set of events and payload content.

 

Options to Reduce Noise (and Their Limitations)

 

Option A: Enable OV-change analysis (adds ovChanged indicator)

If your goal is “only meaningful changes,” one pragmatic approach is to enable OV-change analysis so events can include an ovChanged boolean.

Benefit

  • Helps downstream logic quickly identify when operational values (OV) changed.

Important limitation

  • For scenarios involving reference attributes, ovChanged may be true for related entities due to optimization behavior. That means ovChanged=true is not a perfect proxy for “material business change,” especially with reference cascades.

 

Option B: Tighten streaming destination filters (typeFilter, objectFilter)

Streaming destinations can often be scoped with:

  • typeFilter: restrict which event types are emitted

  • objectFilter: restrict events based on filter expressions (search-filter syntax)

Benefit

  • Reduces overall event volume by excluding entire classes of events or non-target entities.

Key limitations

  • Filtering only works on fields that are present in the event payload.

  • objectFilter reduces scope, but does not inherently detect “no material change” unless your filter can explicitly represent the desired semantics.

 

Option C: Enable DPH data filtering (dataFilteringEnabled) and configure L3 dataPipelineConfig

If you enable dataFilteringEnabled, DPH can restrict which types/attributes are sent downstream based on the model (L3) configuration.

Benefit

  • Strong scope control: exclude entire types and/or allowlist attributes per type.

  • Reduces payload size and downstream processing complexity.

How it works (high level)

  • When enabled, only types with a dataPipelineConfig section are included.

  • Attributes can be allowlisted.

Example L3 fragment (allowlist attributes)

{
"name": "Individual",
"type": "ENTITY",
"dataPipelineConfig": {
"enabled": true,
"attributes": ["FirstName", "LastName", "Email"]
}
}

Key limitation

  • This primarily controls what content is sent, not necessarily whether an event is produced. If the platform emits ENTITY_CHANGED, it can still be emitted—but the payload scope is reduced and downstream work can be minimized.

 

Option D: Set ovOnly = true (reduce payload content)

Setting ovOnly may restrict payload content to operational values.

Benefit

  • Smaller payloads and potentially simpler downstream processing.

Limitation

  • This setting generally affects payload content, not guaranteed event count.

 

Practical Guidance: Choosing the Right Approach

  • You want fewer events overall:
    Start with destination scoping (typeFilter, objectFilter) and ensure you are not emitting event types you don’t consume.

  • You want to keep events but reduce downstream cost:
    Enable DPH data filtering to reduce type/attribute scope, and consider ovOnly.

  • You want to programmatically distinguish meaningful updates:
    Enable OV-change analysis to use ovChanged as a signal (with the reference-cascade limitation noted above).

  • You need true “diffs”:
    Reltio streaming events are not different events by default. If you need strict before/after comparisons, implement downstream diff logic using previous-state storage (e.g., compare against the last processed version).

 

FAQ

Q: Why do I get ENTITY_CHANGED when the entity content appears unchanged?
A: Because the event is a post-state snapshot. If an entity is re-saved, version bumped, or updated indirectly (e.g., cascade), an ENTITY_CHANGED event may be emitted even if the business payload looks similar.

Q: Does DPH filter “only real changes” automatically?
A: Not by default. If dataFilteringEnabled is off and destination filters are broad, DPH forwards what it receives.

Q: Will ovOnly reduce the number of events?
A: It is primarily a payload-content control. Event volume is driven by what the streaming source emits.

Q: Is ovChanged a perfect indicator of meaningful change?
A: No. It’s helpful, but reference attribute scenarios can cause ovChanged=true for related entities.

 

Summary

High ENTITY_CHANGED volume is most commonly explained by snapshot-style change events and platform behaviors that legitimately bump versions or trigger cascades. To reduce noise or cost:

  • Scope events at the streaming destination (typeFilter, objectFilter)

  • Enable OV-change analysis (ovChanged) to classify events downstream

  • Enable DPH filtering (dataFilteringEnabled) + L3 dataPipelineConfig to reduce types/attributes

  • Consider ovOnly to reduce payload size

Was this article helpful?
0 out of 0 found this helpful

Comments

0 comments

Please sign in to leave a comment.