Why is there old environment data remaining in landing/object-type tables after re-sync using reindexDeleted=true property

Question

Customers may notice old records from a prior (lower) Reltio environment still present in Snowflake landing tables and object-type tables even after running a re-sync. This is expected behavior: each synchronization run appends new rows/entries, and the connector relies on Snowflake processing tasks to provide a “latest state” view downstream. The landing/staging design does not automatically purge historical environment data.

The safest cleanup is to TRUNCATE the relevant Snowflake tables, then run a full syncToDataPipeline with reindexDeleted=true to ensure that deletions are correctly re-materialized.

 

Answer

Perform the following when the customer says:

  • “We initially configured Snowflake with a lower environment and now see old data.”

  • “We ran re-sync, but the old records are still in landing/object-type tables.”

  • “How do we remove older environment data from Snowflake connector tables?”

 

Why this happens

  • Each syncToDataPipeline run adds new rows/entries (it does not overwrite/purge the landing table by default).

  • The “latest state” guarantee is provided by Snowflake processing tasks that move/transform data from landing → staging, ensuring consumers see the latest Reltio state.

  • Landing/staging tables are not designed to purge older environments automatically.

Recommended resolution (safest cleanup)

  1. (Recommended) Pause/disable downstream jobs (dashboards, ETL/ELT, reports) to avoid confusion while tables are empty.

  2. TRUNCATE the affected Snowflake tables (landing/object-type, and any related tables the customer wants cleared).

  3. Validate tables are empty.

  4. Run a full sync using:

    • syncToDataPipeline

    • reindexDeleted=true

    • Do not specify dataTypes (removing dataTypes ensures a full reload of all supported datasets)

  5. Monitor the sync and validate row counts/data quality.

  6. Re-enable downstream processes.

Step-by-step procedure

Step 1 — Identify which tables to clear

In the target database/schema, list tables (optional but helpful):

SHOW TABLES IN SCHEMA <database>.<schema>;

Typical candidates include:

  • Landing tables (raw ingestion)

  • Object type tables/datasets related to the connector pipeline

  • Any staging tables the customer explicitly wants cleaned (only if they understand the impact)

If you're unsure, you can start with the landing table(s) where the old environment data is visible.

Step 2 — Truncate the landing/object-type tables

Use TRUNCATE TABLE for each table you want to clear:

TRUNCATE TABLE <database>.<schema>.<landing_table_name>;

Example:

TRUNCATE TABLE PUBLIC.CUSTOMER_DATA;

Notes

  • TRUNCATE removes all rows but preserves the table structure.

  • You must run it one table at a time.

Step 3 — Validate the tables are empty (recommended)

SELECT COUNT(*) FROM <database>.<schema>.<table_name>;

Step 4 — Run a full Reltio syncToDataPipeline (with deletions re-materialized)

Run a full re-sync without dataTypes and with reindexDeleted=true:

POST https://{env}.reltio.com/reltio/api/{tenantID}/syncToDataPipeline?distributed=true&taskPartsCount=4&reindexDeleted=true

Important

  • Remove the dataTypes parameter if it was previously used. Keeping dataTypes may cause a partial reload and can leave gaps after truncation.

  • reindexDeleted=true ensures deletions are also correctly reflected downstream.

 

Step 5 — Monitor and validate

  • Monitor the sync job completion and check for errors.

  • Re-check table row counts and/or run data quality checks.

 

Permissions required in Snowflake

To run TRUNCATE TABLE, the executing role typically needs the DELETE privilege on the table.

Check privileges on a table

SHOW GRANTS ON TABLE <schema_name>.<table_name>;

Example:

SHOW GRANTS ON TABLE PUBLIC.CUSTOMER_DATA;

Confirm current role

SELECT CURRENT_ROLE();

Review role grants

SHOW GRANTS TO ROLE <your_role_name>;

If the customer is an admin, grant DELETE (example)

GRANT DELETE ON TABLE <schema_name>.<table_name> TO ROLE <your_role_name>;

 

Dependency and downtime considerations

Truncating tables means there will be a window where:

  • Tables are empty

  • Views/dashboards may show no data

  • ETL/ELT processes may fail or load incomplete results until sync completes

Recommended actions

  • Schedule during a maintenance window, considering any downstream processes.

  • Notify impacted users/teams

  • Pause downstream pipelines temporarily if needed

In Snowflake, foreign key constraints are informational only (not enforced), but downstream dependencies (views, reports, pipelines) still matter operationally.

FAQ

“Why didn’t re-sync remove old data?”

Because each sync run can append new entries, and landing/staging are not designed to automatically purge historical environment data.

“Can we delete rows in Snowflake instead of truncate?”

You can, but TRUNCATE is usually safest and fastest for a clean reset, followed by a full sync.

“Do we always need a full sync after truncation?”

Yes—if the goal is to repopulate tables completely with fresh, correct data. Truncation wipes the data, so a full sync is required to restore it.

Was this article helpful?
0 out of 0 found this helpful

Comments

0 comments

Please sign in to leave a comment.