Question
We can see for the individual entities, the Fuzzy rule on the first name is not working as expected.
Match analysis is showing the fuzzy match as false.
Example :
Source First name - SHARO
Target First Name - SHARON
The match comparator class that is being used in this case is:
{
"attribute": "configuration/entityTypes/Individual/attributes/FirstName",
"parameters": [
{
"parameter": "pattern",
"value": "[a-zA-Z]+"
},
{
"parameter": "useStemmer",
"value": "true"
},
{
"parameter": "useSoundex",
"value": "true"
},
{
"parameter": "useNoiseIfEmpty",
"value": "true"
}
],
"class": "com.reltio.match.comparator.DistinctWordsComparator"
}
1st entity generated document
"FirstName": [
"sharo"
],
"FirstName~.@ps": [
"10"
2nd entity generated document
"FirstName": [
"sharon"
],
"FirstName~.@ps": [
"10"
],
Answer
DistinctWordsComparator will not work in this case. Possible workarounds that you can use are:
-
DistinctWordsComparator with "thresholdChars": "x". It will split the value by symbols. "thresholdChars" allows setting the number of differences or percent of differences. The possible issue here - this comparator will match symbol transpositions also (e.g. words ‘sharo' and 'rosha’ will be matched with this comparator).
-
DamerauLevenshteinDistance or DynamicDamerauLevenshteinDistance
-
Using of SoundexComparator with maxCodeLen=2.
-
Using pattern in DistinctWordsComparator. The pattern can be configured to split the value, for example, to compare the first 5 symbols only.
If I perform the following
POST https://prod-h360.reltio.com/reltio/tools/matching/compare
Body:
{
"first": "SHARO",
"second": "SHARON",
"comparatorClass": {
"parameters": [
{
"parameter": "pattern",
"value": "[a-zA-Z]+"
}
],
"class": "com.reltio.match.comparator.DynamicDamerauLevenshteinDistance"
},
"fuzzy": true
}
The response from this test was correct.
{
"fuzzy": true,
"first": "SHARO",
"second": "SHARON",
"equals": true,
"relevance": 0.9722222222222222
}
Comments
Please sign in to leave a comment.