Question:
We found a problematic match rule using the match rule analysis tool - configuration/entityTypes/Individual/matchGroups/Suspect10
Answer
The match rule has the following tokenization schemes:
- [single[Addresses.City], single[Addresses.StateProvince], single[Addresses.AddressLine1], single[BC_SSN], fuzzy[FirstName], fuzzy[LastName]]
- [single[Addresses.Zip5], single[Addresses.AddressLine1], single[BC_SSN], fuzzy[FirstName], fuzzy[LastName]]
There is the common part single[Addresses.AddressLine1], single[BC_SSN], fuzzy[FirstName], fuzzy[LastName]]
. Other attributes come from or
conditions.
The two schemes with significant common parts mean that the number of match tokens just doubles due to the [single[Addresses.City], single[Addresses.StateProvince]
and [single[Addresses.Zip5]
.
Alternative Recommendation #1. Add the single[Addresses.City]
, Addresses.StateProvince
and single[Addresses.Zip5
to
ignoreInToken
section. This change is going to reduce the number of match tokens two times. See below:
{
"exact": [
"configuration/entityTypes/Individual/attributes/BC_SSN",
"configuration/entityTypes/Individual/attributes/Addresses/attributes/AddressLine1"
],
"fuzzy": [
"configuration/entityTypes/Individual/attributes/FirstName",
"configuration/entityTypes/Individual/attributes/LastName" 9
],
"cleanse": [
{
"cleanseAdapter": "com.reltio.cleanse.impl.NameDictionaryCleanser",
"mappings": [
{
"attribute": "configuration/entityTypes/Individual/attributes/FirstName",
"mandatory": false,
"allValues": false,
"cleanseAttribute": "configuration/entityTypes/Individual/attributes/FirstName"
},
{
"attribute": "configuration/entityTypes/Individual/attributes/LastName",
"mandatory": false,
"allValues": false,
"cleanseAttribute": "configuration/entityTypes/Individual/attributes/LastName"
}
]
}
],
"matchTokenClasses": {
"mapping": [
{
"attribute": "configuration/entityTypes/Individual/attributes/LastName",
"class": "com.reltio.match.token.FuzzyTextMatchToken"
},
{
"attribute": "configuration/entityTypes/Individual/attributes/FirstName",
"class": "com.reltio.match.token.FuzzyTextMatchToken"
}
]
},
"comparatorClasses": {
"mapping": [
{
"attribute": "configuration/entityTypes/Individual/attributes/FirstName",
"class": "com.reltio.match.comparator.StringCharactersComparator"
},
{
"attribute": "configuration/entityTypes/Individual/attributes/LastName",
"class": "com.reltio.match.comparator.StringCharactersComparator"
}
]
},
"or": {
"exact": [
"configuration/entityTypes/Individual/attributes/Addresses/attributes/Zip5"
],
"and": {
"exact": [
"configuration/entityTypes/Individual/attributes/Addresses/attributes/City",
"configuration/entityTypes/Individual/attributes/Addresses/attributes/StateProvince"
]
}
},
"ignoreInToken": [
"configuration/entityTypes/Individual/attributes/Addresses/attributes/Zip5",
"configuration/entityTypes/Individual/attributes/Addresses/attributes/City",
"configuration/entityTypes/Individual/attributes/Addresses/attributes/StateProvince"
]
}
The match rule has fuzzy
for both of the two attributes FirstName
and LastName
. The two attributes both have FuzzyTextMatchToken
assigned. The FuzzyTextMatchToken
produces 7 match tokens for each value. Given that there are two attributes in fuzzy
every pair of FirstName
+ LastName
generates 49 match tokens.
The NameDictionaryCleanser
for FirstName
also can significantly increase the number of values and tokens for FirstName
. At the same time, the rule uses StringCharactersComparator
to compare FirstName
, LastName
. The comparator filters out all non-alphabetic characters from the compared values and uses an exact comparison. There is a mismatch between FuzzyTextMatchToken
and StringCharactersComparator
. The FuzzyTextMatchToken
should be replaced by a more suitable match token class.
Alternative Recommendation #2. Apply the following match token class for FirstName
and LastName
in the match rule:
{
"matchTokenClasses": {
"mapping": [
{
"attribute": "configuration/entityTypes/Individual/attributes/FirstName",
"parameters": [
{
"parameter": "groups",
"values": [
{
"pattern": "[a-zA-Z]*",
"wordDelimiter": "",
"sortWords": false,
"className": "com.reltio.match.token.ExactMatchToken"
}
]
}
],
"class": "com.reltio.match.token.CustomMatchToken"
},
{
"attribute": "configuration/entityTypes/Individual/attributes/LastName",
"parameters": [
{
"parameter": "groups",
"values": [
{
"pattern": "[a-zA-Z]*",
"wordDelimiter": "",
"sortWords": false,
"className": "com.reltio.match.token.ExactMatchToken"
}
]
}
],
"class": "com.reltio.match.token.CustomMatchToken"
}
]
}
}
This configuration will filter out all non-alphabetic characters and paste all the words into a single string. This behavior exactly matches the StringCharactersComparator
. The solution will resolve the many match tokens
issue, overcollisioned match tokens
issue (due to better behavior of the configuration compared to a very loose FuzzyTextMatchToken
).
Both recommendations are applicable at the same time. However, recommendation #2 is a holistic solution and should be tried first.
Notes:
- After the match rules are changed, match tables rebuild is required.
- Use the following tools to check match rules behavior:
- Use Match Rule Analyzer (see Reltio Console, and https://docs.reltio.com/matchesapi/matchruleanalyzerdynamic.html)
- Verify Matches (https://docs.reltio.com/matchesapi/verifymatches.html)
- Explain Tokens (https://docs.reltio.com/matchesapi/getmatchtokens.html)
Comments
Please sign in to leave a comment.