Question:
We found a problematic match rule using the match rule analysis tool - configuration/entityTypes/Individual/matchGroups/Suspect10
Answer
The match rule has the following tokenization schemes:
- [single[Addresses.City], single[Addresses.StateProvince], single[Addresses.AddressLine1], single[BC_SSN], fuzzy[FirstName], fuzzy[LastName]]
- [single[Addresses.Zip5], single[Addresses.AddressLine1], single[BC_SSN], fuzzy[FirstName], fuzzy[LastName]]
There is the common part single[Addresses.AddressLine1], single[BC_SSN], fuzzy[FirstName], fuzzy[LastName]]. Other attributes come from or conditions.
The two schemes with significant common parts mean that the number of match tokens just doubles due to the [single[Addresses.City], single[Addresses.StateProvince] and [single[Addresses.Zip5].
Alternative Recommendation #1. Add the single[Addresses.City], Addresses.StateProvince and single[Addresses.Zip5 to
ignoreInToken section. This change is going to reduce the number of match tokens two times. See below:
{
"exact": [
"configuration/entityTypes/Individual/attributes/BC_SSN",
"configuration/entityTypes/Individual/attributes/Addresses/attributes/AddressLine1"
],
"fuzzy": [
"configuration/entityTypes/Individual/attributes/FirstName",
"configuration/entityTypes/Individual/attributes/LastName" 9
],
"cleanse": [
{
"cleanseAdapter": "com.reltio.cleanse.impl.NameDictionaryCleanser",
"mappings": [
{
"attribute": "configuration/entityTypes/Individual/attributes/FirstName",
"mandatory": false,
"allValues": false,
"cleanseAttribute": "configuration/entityTypes/Individual/attributes/FirstName"
},
{
"attribute": "configuration/entityTypes/Individual/attributes/LastName",
"mandatory": false,
"allValues": false,
"cleanseAttribute": "configuration/entityTypes/Individual/attributes/LastName"
}
]
}
],
"matchTokenClasses": {
"mapping": [
{
"attribute": "configuration/entityTypes/Individual/attributes/LastName",
"class": "com.reltio.match.token.FuzzyTextMatchToken"
},
{
"attribute": "configuration/entityTypes/Individual/attributes/FirstName",
"class": "com.reltio.match.token.FuzzyTextMatchToken"
}
]
},
"comparatorClasses": {
"mapping": [
{
"attribute": "configuration/entityTypes/Individual/attributes/FirstName",
"class": "com.reltio.match.comparator.StringCharactersComparator"
},
{
"attribute": "configuration/entityTypes/Individual/attributes/LastName",
"class": "com.reltio.match.comparator.StringCharactersComparator"
}
]
},
"or": {
"exact": [
"configuration/entityTypes/Individual/attributes/Addresses/attributes/Zip5"
],
"and": {
"exact": [
"configuration/entityTypes/Individual/attributes/Addresses/attributes/City",
"configuration/entityTypes/Individual/attributes/Addresses/attributes/StateProvince"
]
}
},
"ignoreInToken": [
"configuration/entityTypes/Individual/attributes/Addresses/attributes/Zip5",
"configuration/entityTypes/Individual/attributes/Addresses/attributes/City",
"configuration/entityTypes/Individual/attributes/Addresses/attributes/StateProvince"
]
}
The match rule has fuzzy for both of the two attributes FirstName and LastName. The two attributes both have FuzzyTextMatchToken assigned. The FuzzyTextMatchToken produces 7 match tokens for each value. Given that there are two attributes in fuzzy every pair of FirstName + LastName generates 49 match tokens.
The NameDictionaryCleanser for FirstName also can significantly increase the number of values and tokens for FirstName. At the same time, the rule uses StringCharactersComparator to compare FirstName, LastName. The comparator filters out all non-alphabetic characters from the compared values and uses an exact comparison. There is a mismatch between FuzzyTextMatchToken and StringCharactersComparator. The FuzzyTextMatchToken should be replaced by a more suitable match token class.
Alternative Recommendation #2. Apply the following match token class for FirstName and LastName in the match rule:
{
"matchTokenClasses": {
"mapping": [
{
"attribute": "configuration/entityTypes/Individual/attributes/FirstName",
"parameters": [
{
"parameter": "groups",
"values": [
{
"pattern": "[a-zA-Z]*",
"wordDelimiter": "",
"sortWords": false,
"className": "com.reltio.match.token.ExactMatchToken"
}
]
}
],
"class": "com.reltio.match.token.CustomMatchToken"
},
{
"attribute": "configuration/entityTypes/Individual/attributes/LastName",
"parameters": [
{
"parameter": "groups",
"values": [
{
"pattern": "[a-zA-Z]*",
"wordDelimiter": "",
"sortWords": false,
"className": "com.reltio.match.token.ExactMatchToken"
}
]
}
],
"class": "com.reltio.match.token.CustomMatchToken"
}
]
}
}
This configuration will filter out all non-alphabetic characters and paste all the words into a single string. This behavior exactly matches the StringCharactersComparator. The solution will resolve the many match tokens issue, overcollisioned match tokens issue (due to better behavior of the configuration compared to a very loose FuzzyTextMatchToken).
Both recommendations are applicable at the same time. However, recommendation #2 is a holistic solution and should be tried first.
Notes:
- After the match rules are changed, match tables rebuild is required.
- Use the following tools to check match rules behavior:
- Use Match Rule Analyzer (see Reltio Console, and https://docs.reltio.com/matchesapi/matchruleanalyzerdynamic.html)
- Verify Matches (https://docs.reltio.com/matchesapi/verifymatches.html)
- Explain Tokens (https://docs.reltio.com/matchesapi/getmatchtokens.html)
Comments
Please sign in to leave a comment.