Question
When you add the parameter reindexDeleted=true to the syncToDataPipeline endpoint, it changes the behavior of the sync operation as follows:
Purpose: It ensures that events for removed (deleted or loser) entities and relations are also sent to the Data Pipeline Hub (and thus to downstream adapters like BigQuery, Snowflake, etc.).
Default Behavior: By default, deleted entities and relations are NOT included in the sync.
With reindexDeleted=true: The sync will include both active and deleted (or loser) records, helping to maintain consistency in downstream systems, especially after merges or deletions.
Example Usage
POST https://{env}.reltio.com/reltio/api/{tenantID}/syncToDataPipeline?reindexDeleted=trueThis will trigger the sync of all data, including deleted/loser records.
Important Notes
If you specify an exact dataType (e.g.,
dataTypes=entities), thereindexDeletedparameter will not have any effect unless you also include the deleted type (e.g.,dataTypes=entities,deleted_entities).This is particularly useful for keeping downstream data stores in sync with the true state of your Reltio tenant, including records that have been removed due to merges or deletions.
Here are all the primary use cases for when you should use reindexDeleted=true with the syncToDataPipeline API in Reltio:
Summary Table:
Use Case | Why Use |
|---|---|
Downstream sync of deletions | Data consistency |
Post-merge clean-up | Remove “loser” records |
Bulk deletions | Compliance, accuracy |
Full sync/reconciliation | Complete data reflection |
Audit/data lineage | Traceability |
Reporting consistency | Prevent stale data |
Incremental CDC | Capture all changes |
Disaster recovery | Restore correct state |
Synchronizing Deletions to Downstream Systems
Scenario: You have downstream systems (like BigQuery, Snowflake, or other analytics/data lakes) that must reflect not only active data but also deletions (including merged-away “loser” records and deleted relations/entities).
Why: By default, deletions are not sent to the pipeline. Using
reindexDeleted=trueensures that downstream systems can accurately mirror the current state of your Reltio tenant, including removals.
Post-Merge Clean-Up
Scenario: After a large merge operation, you want to ensure that all “loser” entities (the records that were merged into a survivor and are now logically deleted) are also removed or flagged in downstream data stores.
Why: Without this, downstream systems may retain obsolete/merged-away records, leading to data quality issues.
Bulk Deletion Events
Scenario: You have performed bulk deletions (e.g., data clean-up, GDPR/CCPA compliance, or periodic purges) and need to propagate these deletions to all connected data pipelines.
Why: Ensures regulatory compliance and data consistency across all systems.
Initial Full Sync or Data Reconciliation
Scenario: You are performing a full re-sync or reconciliation between Reltio and a downstream system, and you want to guarantee that all deletions (historical and recent) are included in the sync.
Why: This is critical for initial loads or after a major data correction, so the downstream system is a true reflection of the Reltio tenant.
Audit and Data Lineage Requirements
Scenario: Your organization requires a complete audit trail or lineage, including when and which records were deleted or merged away.
Why: Including deleted records in the sync supports auditability and traceability in downstream analytics or compliance systems.
Downstream Data Warehouse/Reporting Consistency
Scenario: Your reporting or analytics relies on the downstream data warehouse being an exact replica of Reltio, including all deletes and merges.
Why: Prevents reporting on stale or incorrect data due to missing deletions.
Incremental Loads with Change Data Capture
Scenario: You are running incremental syncs and want to ensure that deletions are captured as part of the change data capture (CDC) process.
Why: Ensures that all changes, including removals, are reflected in the target system.
Data Recovery or Disaster Recovery Scenarios
Scenario: After a recovery event, you need to re-sync all data, including deletions, to restore downstream systems to the correct state.
Why: Guarantees that no deleted or merged-away records are inadvertently restored or left behind.
Here are scenarios where it is not a good idea to use reindexDeleted=true with the syncToDataPipeline endpoint in Reltio:
Performance-Sensitive or Large-Scale Syncs
Scenario: You are syncing a huge tenant or dataset and want to minimize the volume of data processed and transferred.
Why: Including deleted/loser records can significantly increase the amount of data to be processed, potentially impacting performance, sync duration, and resource usage.
Downstream System Does Not Track Deletions
Scenario: Your downstream system (e.g., reporting database, analytics platform) is designed to only store active records and does not require or support tracking deletions or merged-away records.
Why: Sending deleted records may unnecessarily complicate downstream data models or processes.
One-Time Sync for Active Data Only
Scenario: You are performing a one-time sync to populate a downstream system with only the current, active state of your data (e.g., for a new analytics dashboard that only needs active entities/relations).
Why: Including deleted records is unnecessary and may clutter the target system.
Testing or Development Environments
Scenario: You are running syncs in a test or development environment where you only care about current data for validation or prototyping.
Why: Including deleted records may slow down testing and is usually not needed for development purposes.
Cost-Sensitive Data Pipelines
Scenario: Your data pipeline or downstream storage incurs costs based on data volume (e.g., cloud storage, data transfer).
Why: Including deleted records increases data volume and may lead to unnecessary costs if deletions are not required downstream.
Downstream Consumers Not Ready for Deletion Events
Scenario: Your downstream consumers (ETL jobs, dashboards, applications) are not yet designed to handle deletion or merge events.
Why: Sending deleted records could cause errors, confusion, or data integrity issues in those systems.
Incremental Syncs Where Deletions Are Already Handled
Scenario: You have a separate process or mechanism for handling deletions in downstream systems.
Why: Using
reindexDeleted=truewould be redundant and could lead to duplicate or conflicting deletion events.
Regulatory or Privacy Restrictions
Scenario: There are regulatory or privacy requirements that restrict the propagation of deleted data (even as a deletion event) to certain downstream systems.
Why: Syncing deleted records could violate compliance policies.
Let’s break down whether using?dataTypes=merges,relations&reindexDeleted=true
with the syncToDataPipeline endpoint is a good idea, based on Reltio documentation and best practices.
What Does This Do?
dataTypes=merges,relations: Syncs only merge events and relation data.reindexDeleted=true: Instructs the sync to also include deleted/loser records for the specified data types.
It’s a Good Idea If:
You Need Downstream Consistency for Merges and Relations
If your downstream systems (like BigQuery, Snowflake, etc.) must reflect not only active merges and relations but also those that have been deleted or merged away, this combination ensures full fidelity.
Example: You want to track the full lifecycle of relationships and merges, including those that have been removed.
You’re Auditing or Reconciling Data
If you need a complete audit trail or are reconciling data between Reltio and downstream systems, including deletions, is essential.
You’re Cleaning Up After Bulk Merges or Deletions
After major merge operations, this ensures that “loser” records and deleted relations are also removed or flagged downstream.
It’s NOT a Good Idea If:
Downstream Systems Only Need Active Data
If your downstream consumers only care about current, active merges and relations, including deleted ones adds unnecessary complexity and data volume.
Performance or Cost is a Concern
Including deleted records increases the data processed and transferred, which can impact performance and cost, especially for large tenants.
Downstream Consumers Can’t Handle Deletion Events
If your downstream ETL, analytics, or reporting tools aren’t set up to process deletions or “loser” records, this could cause confusion or errors.
Special Note from Reltio Documentation
If you specify an exact dataType (for example,
dataTypes=entities), thereindexDeletedparameter will not have any effect unless you also include the deleted type (for example,dataTypes=entities,deleted_entities).
However, for merges and relations,reindexDeleted=truewill ensure that deleted/loser records for those types are included in the sync.
Comments
Please sign in to leave a comment.