Question
How can I re-synchronize GBQ and Reltio? This should be performed when a change is made to the data model or if the GBQ counts appear to be out of synch.
Answer
- Make sure that the tenant has GBQ streaming access enabled. Please refer to this link to find the proper RI environment.
GET https://<RI_env>.reltio.com/api/v1.0/configuration/<tenantId>
- In the response check if the GBQ analytics is enabled:
{ "status": "success",
"configuration": {
"authData": {
...
},
"updatesConfig": {
...
"consumeGbqEvents": true,
"hoursToGbqCompaction": 168,
...
},
"analyticsEnabled": true,
...
}
}
If the above is not set, refer to https://reltio.jira.com/wiki/spaces/IRD/pages/381353985/RIQ+Configuration+API to set the values as expected.
- Clear out the GBQ dataset
POST <RI_env>.reltio.com/api/v1.0/gbq/cleanup
Body:
{
"tenantId": "<tenantID>"
}
Expect a response with 200 HTTP code:
{ "status": "success" }
- If a dataset has lots of data then the request might take some time. In order to not catch timeout async mode is implemented.
Invoke:
POST <RI_env>.reltio.com/api/v1.0/gbq/cleanup?async=true
Body:
{
"tenantId": "<tenantID>"
}
- Before running the re-synchronization of RI and GBQ, we need to check the RIQ queue status.
- We need the ROLE_ADMIN_TENANT to execute the below API.
- Execute the below API to find the RIQ Queue Status
- The status must be green before a re-synchronization process is started.
GET //<RI_env>.reltio.com/api/v1.0/tenants/{tenantId}/status/queues
Response
{
"status": "yellow",
"payload": {
"total": {
"size": 5418,
"dlqSize": 0
}
},
"description": "Queues are not empty. Please wait till they will be processed.",
"message": "Queues are not empty."
}
- Execute S3 Synchronization job
POST https://<RI_env>.reltio.com/api/v1.0/jobs
Body:
{
"name": "synchronize",
"tenant": "<TenantID>",
"tasks": [
{
"application": "EntitiesExport",
"payload": {}
},
{
"application": "InteractionsExport",
"payload": {}
},
{
"application": "RelationsExport",
"payload": {}
},
{
"application": "MatchesExport",
"payload": {}
},
{
"application": "MergesExport",
"payload": {}
}
]
}
Response example:
{
"id": "4FvvVFXh",
"uri": "/api/v1.0/jobs/4FvvVFXh",
"status": "JOB_PENDING"
}
You can monitor the job status by using the following call:
GET https://<RI_env>.reltio.com/api/v1.0/jobs/<job_id>
- Execute GcsExport Job
POST https://<RI_env>.reltio.com/api/v1.0/jobs
{
"name": "GcsExport",
"tenant": "<tenant_id>",
"tasks": [
{
"application": "GcsExport",
"payload": {
"objectClass": "ENTITIES",
"force": "true"
}
},
{
"application": "GcsExport",
"payload": {
"objectClass": "RELATIONS",
"force": "true"
}
},
{
"application": "GcsExport",
"payload": {
"objectClass": "INTERACTIONS",
"force": "true"
}
},
{
"application": "GcsExport",
"payload": {
"objectClass": "MATCHES",
"force": "true"
}
},
{
"application": "GcsExport",
"payload": {
"objectClass": "MERGES",
"force": "true"
}
}
]
}
- Verify that re-sync was successful by one of the following methods:
1. Run an API call to check the total amount of entities:
GET https://<env>.reltio.com/reltio/api/<tenant_id>/entities/_totalResponse example:
"total": 1443268
}
COUNT(*)
FROM
`customer-facing.views_riq_dw_<env>_<tenant_id>.entities_merged`
WHERE
deleted = FALSE and softDeleted = false
group by Type
- Compare the total number of entities to the Response from API call in the 1st step.
Example:
If the number of entities in QBQ is equal to the number in the API call, the sync was completed successfully.
2. Another method is to run the following Qubole script:
val url = "https://<envirnoment>-af.reltio.com"
val tenant = "<tenantId>"
val token = "<access token>"
import com.reltio.analytics.framework._
import com.reltio.analytics.data.application._
import com.reltio.analytics.data.persist._
import com.reltio.analytics.data.persist.attributes._
import com.reltio.analytics.objects.transformation._
import org.apache.spark.sql._
import com.reltio.analytics.data.delete._
val aframe = AnalyticsFramework.login(sqlContext, url, tenant, token)
val df = aframe.entities(s"configuration/entityType/<specify type here>", deltaWindow = null, activeOnly = false)
df.count
Comments
Please sign in to leave a comment.