Start with data profiling.
Profile completeness, formats, cardinality, and noise words for each attribute you plan to match. This is the first step Reltio recommends before you design any comparison formula or tokens.
Pair your comparator class with a matching token class (“keep them symmetric”)
Reltio’s guidance and examples stress aligning the comparator and token generator (e.g., DistinctWords comparator with a DistinctWords token). If you wrap a base class, keep the pair symmetric and add extras like noise word removal, stemming, or Soundex only if necessary.
Use Reltio’s catalog of comparator classes and their relevance behaviors to pick the proper comparator per attribute. (Reltio documents how each comparator behaves and how its relevance can be computed.)
Identifying the Correct Comparator for a Match Rule
Define the comparator for each attribute rather than relying on system defaults—especially for fuzzy logic—to ensure predictable, optimal behavior.
Use strict comparators (e.g., ExactMatchComparator) for IDs, codes, and other fields requiring exact equality. Use fuzzy comparators (e.g., DamerauLevenshteinDistance, SoundexComparator) for attributes with natural variation (names, addresses).
Match rules can combine comparators with logical operators (AND/OR). Ensure the logic aligns with acceptable levels of false positives and negatives for your use case.
Identifying the Correct Tokenization for a Match Rule
Tokens exist to narrow the candidate set that your rule compares. Design them so that rules retrieve the correct records without overwhelming the match engine.
A single token phrase may link to a maximum of 300 profiles. Broad/low-cardinality tokens (e.g., country alone) or overly fuzzy tokenization can significantly increase the number of candidates and degrade performance—avoid or combine them with other attributes to maintain selectivity.
Prefer high-cardinality, stable attributes for tokens (e.g., exact email or external ID) and composite tokens (e.g., LastName + PostalCode) when single attributes are too broad.
Do not rely on defaults. Map a specific token class for every attribute you tokenize. If an attribute should not produce tokens, set ignoreInToken to suppress token generation.
Use phonetic tokens for names, address‑specific tokens for addresses, etc. Avoid using fuzzy match token classes unless necessary—they can generate numerous tokens and slow down candidate retrieval.
Be cautious with high‑fan‑out classes (e.g.,). They can inflate token counts and reduce performance. When using , pair with <another component> where appropriate, and analyze collision rates to verify.
Configure rules to only use OV values. This usually maintains match quality while enhancing runtime performance compared to considering non-OV values.
Attribute-level
Example:
Exact first name (and/or last name):
Comparator:BasicStringComparator
Token class:ExactMatchToken
Operator: Exact
Use this when you want exact equality.Organization name (multi-word, noisy):
Comparator:DistinctWords(often via a custom wrapper)
Token class:DistinctWords(matching the comparator)
Enhancements: noise-word dictionary, Stemmer, Soundex, if justified by your data.
Selection Example
| Match rule tactic | Example attribute(s) | Typical ComparatorClass to use | Why |
|---|---|---|---|
| Automatic — Exact email | Emails/Address |
BasicStringComparator | Exact string tactics using BasicStringComparator with ExactMatchToken for exact matching use-cases; the same guidance applies to exact email. |
| Automatic — Exact SSN | Identifiers |
BasicStringComparator | For strict equality on identifiers, use Exact with BasicStringComparator and ExactMatchToken; Reltio emphasizes aligning exact operators with BasicString. |
| Relevance — Fuzzy first & last name (edit-distance) |
FirstName, LastName
|
DamerauLevenshteinDistance | Map FuzzyTextMatchToken on First/Last Name to DamerauLevenshteinDistance for misspellings. FuzzyText can generate many tokens, performance should be monitored |
| Relevance— Phonetic first & last name |
FirstName, LastName
|
DoubleMetaphoneComparator | DoubleMetaphoneComparator paired with DoubleMetaphoneMatchToken as a phonetic tactic for names (nicknames/sound-alikes). |
| Relevance — Tolerant street address |
|
AddressLineComparator | Use AddressLineMatchToken with AddressLineComparator for address normalization (drops numbers/garbage words, sorts remaining terms). Purpose-built for address tolerance. |
| Relevance — Exact phone + fuzzy name |
|
PhoneNumberComparator (phone) + DamerauLevenshteinDistance or DoubleMetaphoneComparator (name) | Combine exact phone with fuzzy name; Use PhoneNumberComparator with PhoneNumberMatchToken supports noiseDictionary directly. |
| Proximity fuzzy — Geo-nearby |
Location/Geo (lat/long) |
(Geo proximity comparator, per rule) | Use ProximateGeoToken for radius-based geo tokenization in proximity matching. Use for “nearby” logic. |
| Comparator class (exact name) | Best for this data | Why/when to use it (from Reltio docs) |
|---|---|---|
| BasicStringComparator | Exact/standardized strings (IDs, codes, normalized email, gender, suffix) | Performs literal equality; recommended for “Exact” tactics on many attributes, and returns 1 for identical values, 0 otherwise in relevance-based rules. |
| DamerauLevenshteinDistance | Strings with typos/misspellings (first/last names, org names) | Edit-distance comparator for transpositions, insertions, deletions, substitutions; used in Reltio’s fuzzy name examples with FuzzyTextMatchToken. |
| DynamicDamerauLevenshteinDistance | Like above, with dynamic tolerance on longer strings | Same edit-distance family, with dynamic behavior. |
| DoubleMetaphoneComparator | Phonetic person names (sound-alikes) | Double Metaphone + Name Dictionary to handle spelling/nickname variants in first names. |
| SoundexComparator | Phonetic similarity (names) | Use Soundex as a phonetic alternative; used in examples (e.g., fuzzy first/last in suspect rules). |
| DistinctWordsComparator | Multi-word (bag-of-words) text, especially organization names | Basis for org-name custom strategies (with stemming/Soundex/noise dictionary). When used, use ignoring it for tokenization (use ignoreInToken) to avoid token explosion. |
| BasicTokenizedOrganizationNameComparator | Organization names | Has built-in noise-word removal (e.g., “Inc”, “LLC”) to reduce clutter before comparison. |
| AddressLineComparator | Street AddressLine1 | Purpose-built; includes an in-built noise-word list (e.g., “St”, “Ave”) to normalize addresses for tolerant matching. |
| PhoneNumberComparator | Phone numbers across formats | Normalizes digits; comparator & token both support a noiseDictionary so you don’t need a separate cleanser; matching logic requires sufficient digits. |
| RangeNumericComparator | Numeric ranges (e.g., ZIP5 within ±N) | Lets you set a threshold range (example: Zip5 within ±2) for tolerant numeric comparison; A base classes with settable parameters when wrapped. |
| CustomComparator (wrapper) | When you need to extend a base class (add stemming, Soundex, noise dictionaries, parameters) | Official guidance: wrap an out-of-the-box base comparator to add capabilities; keep comparator/token symmetric with the custom token class. |
Tip: For each attribute, choose the comparator that matches its semantics (e.g., exact string vs. bag-of-words for names) and pick a token class that mirrors it. Use cleaners/normalizers first so the comparator gets consistent inputs.
Build the comparison formula last, after tokens & comparators
Reltio rules use Boolean or arithmetic formulas, depending on rule type (automatic/suspect/custom or relevance-based). Keep rules minimal, non-overlapping, and focused—excess rules or redundant tactics slow the engine.
Test & iterate safely
Use the Verify Matches API to simulate how two entities would match (with your default rules, a specific rule, or even an ad-hoc rule body). This is ideal for tuning before you retokenize.
Use Match Rule Analyzer and the Tokenization Schemes API to inspect the actual tokens. Identify collisions or excessive token counts and adjust. Consolidate similar schemes to reduce the total number without hurting match quality.
When you must retokenize, use
RebuildMatchTableTaskwith query filters to limit scope; full retokenization is resource-intensive on large datasets.
Practical do’s & don’ts
Start with precise, high-confidence rules (e.g., exact IDs or emails) using exact tokens; then layer fuzzier rules where necessary.
Keep token phrases selective; combine attributes when individual ones are too common; watch the 300-profile cap.
Align token+comparator (symmetry), and introduce phonetic/stemming/noise dictionaries only when supported by your data quality analysis.
Do not design rules before profiling; add overlapping rules; or rely on low-cardinality tokens alone.
Reference
https://docs.reltio.com/en/reltio/what-does-reltio-do/what-reltio-does-at-a-glance/data-unification-and-mdm-at-a-glance/data-unification-and-mdm-in-detail/reltio-match-and-merge/match-group-elements---description-and-configuration/rule-element/comparator-classes
https://docs.reltio.com/en/reltio/what-does-reltio-do/what-reltio-does-at-a-glance/data-unification-and-mdm-at-a-glance/data-unification-and-mdm-in-detail/reltio-match-and-merge/match-strategies-for-the-most-common-attributes
https://docs.reltio.com/en/objectives/resolve-potential-matches/potential-matching-at-a-glance/potential-matching-navigation/configure-match-rules-overview/create-initial-match-rules/illustrative-examples-of-various-match-rules/example-1---simple-rule-with-reltio-name-dictionary
-
Comments
Please sign in to leave a comment.