How can I address a problematic suspect match rule (identified in the match analysis tool) that is causing performance problems?

Question:

We found a problematic match rule using the match rule analysis tool - configuration/entityTypes/Individual/matchGroups/Suspect10
 

Answer

The match rule has the following tokenization schemes:

  1. [single[Addresses.City], single[Addresses.StateProvince], single[Addresses.AddressLine1], single[BC_SSN], fuzzy[FirstName], fuzzy[LastName]]
  2. [single[Addresses.Zip5], single[Addresses.AddressLine1], single[BC_SSN], fuzzy[FirstName], fuzzy[LastName]]


There is the common part single[Addresses.AddressLine1], single[BC_SSN], fuzzy[FirstName], fuzzy[LastName]]. Other attributes come from or conditions.
 
The two schemes with significant common parts mean that the number of match tokens just doubles due to the [single[Addresses.City], single[Addresses.StateProvince] and [single[Addresses.Zip5].

Alternative Recommendation #1. Add the single[Addresses.City]Addresses.StateProvince and single[Addresses.Zip5 to

 ignoreInToken section. This change is going to reduce the number of match tokens two times. See below:

{
"exact": [
"configuration/entityTypes/Individual/attributes/BC_SSN",
"configuration/entityTypes/Individual/attributes/Addresses/attributes/AddressLine1"
],
"fuzzy": [
"configuration/entityTypes/Individual/attributes/FirstName",
"configuration/entityTypes/Individual/attributes/LastName" 9
],
"cleanse": [
{
"cleanseAdapter": "com.reltio.cleanse.impl.NameDictionaryCleanser",
"mappings": [
{
"attribute": "configuration/entityTypes/Individual/attributes/FirstName",
"mandatory": false,
"allValues": false,
"cleanseAttribute": "configuration/entityTypes/Individual/attributes/FirstName"
},
{
"attribute": "configuration/entityTypes/Individual/attributes/LastName",
"mandatory": false,
"allValues": false,
"cleanseAttribute": "configuration/entityTypes/Individual/attributes/LastName"
}
]
}
],
"matchTokenClasses": {
"mapping": [
{
"attribute": "configuration/entityTypes/Individual/attributes/LastName",
"class": "com.reltio.match.token.FuzzyTextMatchToken"
},
{
"attribute": "configuration/entityTypes/Individual/attributes/FirstName",
"class": "com.reltio.match.token.FuzzyTextMatchToken"
}
]
},
"comparatorClasses": {
"mapping": [
{
"attribute": "configuration/entityTypes/Individual/attributes/FirstName",
"class": "com.reltio.match.comparator.StringCharactersComparator"
},
{
"attribute": "configuration/entityTypes/Individual/attributes/LastName",
"class": "com.reltio.match.comparator.StringCharactersComparator"
}
]
},
"or": {
"exact": [
"configuration/entityTypes/Individual/attributes/Addresses/attributes/Zip5"
],
"and": {
"exact": [
"configuration/entityTypes/Individual/attributes/Addresses/attributes/City",
"configuration/entityTypes/Individual/attributes/Addresses/attributes/StateProvince"
]
}
},
"ignoreInToken": [
"configuration/entityTypes/Individual/attributes/Addresses/attributes/Zip5",
"configuration/entityTypes/Individual/attributes/Addresses/attributes/City",
"configuration/entityTypes/Individual/attributes/Addresses/attributes/StateProvince"
]
}


The match rule has fuzzy for both of the two attributes FirstName and LastName. The two attributes both have FuzzyTextMatchToken assigned. The FuzzyTextMatchToken produces 7 match tokens for each value. Given that there are two attributes in fuzzy every pair of FirstName + LastName generates 49 match tokens.

 

The NameDictionaryCleanser for FirstName also can significantly increase the number of values and tokens for FirstName. At the same time, the rule uses StringCharactersComparator to compare FirstNameLastName. The comparator filters out all non-alphabetic characters from the compared values and uses an exact comparison. There is a mismatch between FuzzyTextMatchToken and StringCharactersComparator. The FuzzyTextMatchToken should be replaced by a more suitable match token class.
 
 
Alternative Recommendation #2. Apply the following match token class for FirstName and LastName in the match rule:
 

{
"matchTokenClasses": {
"mapping": [
{
"attribute": "configuration/entityTypes/Individual/attributes/FirstName",
"parameters": [
{
"parameter": "groups",
"values": [
{
"pattern": "[a-zA-Z]*",
"wordDelimiter": "",
"sortWords": false,
"className": "com.reltio.match.token.ExactMatchToken"
}
]
}
],
"class": "com.reltio.match.token.CustomMatchToken"
},
{
"attribute": "configuration/entityTypes/Individual/attributes/LastName",
"parameters": [
{
"parameter": "groups",
"values": [
{
"pattern": "[a-zA-Z]*",
"wordDelimiter": "",
"sortWords": false,
"className": "com.reltio.match.token.ExactMatchToken"
}
]
}
],
"class": "com.reltio.match.token.CustomMatchToken"
}
]
}
}

This configuration will filter out all non-alphabetic characters and paste all the words into a single string. This behavior exactly matches the StringCharactersComparator. The solution will resolve the many match tokens issue, overcollisioned match tokens issue (due to better behavior of the configuration compared to a very loose FuzzyTextMatchToken).
Both recommendations are applicable at the same time. However, recommendation #2 is a holistic solution and should be tried first.
Notes:

 

Was this article helpful?
0 out of 0 found this helpful

Comments

0 comments

Please sign in to leave a comment.