Practical Guide for choosing and testing Token strategies and comparison classes

Gloria

Updated October 07, 2025 21:09

Start with data profiling.

Profile completeness, formats, cardinality, and noise words for each attribute you plan to match. This is the first step Reltio recommends before you design any comparison formula or tokens.

Pair your comparator class with a matching token class (“keep them symmetric”)

Reltio’s guidance and examples stress aligning the comparator and token generator (e.g., DistinctWords comparator with a DistinctWords token). If you wrap a base class, keep the pair symmetric and add extras like noise word removal, stemming, or Soundex only if necessary.
Use Reltio’s catalog of comparator classes and their relevance behaviors to pick the proper comparator per attribute. (Reltio documents how each comparator behaves and how its relevance can be computed.)

Identifying the Correct Comparator for a Match Rule

Define the comparator for each attribute rather than relying on system defaults—especially for fuzzy logic—to ensure predictable, optimal behavior.

Use strict comparators (e.g., ExactMatchComparator) for IDs, codes, and other fields requiring exact equality. Use fuzzy comparators (e.g., DamerauLevenshteinDistance, SoundexComparator) for attributes with natural variation (names, addresses).

Match rules can combine comparators with logical operators (AND/OR). Ensure the logic aligns with acceptable levels of false positives and negatives for your use case.

Identifying the Correct Tokenization for a Match Rule

Tokens exist to narrow the candidate set that your rule compares. Design them so that rules retrieve the correct records without overwhelming the match engine.

A single token phrase may link to a maximum of 300 profiles. Broad/low-cardinality tokens (e.g., country alone) or overly fuzzy tokenization can significantly increase the number of candidates and degrade performance—avoid or combine them with other attributes to maintain selectivity.

Prefer high-cardinality, stable attributes for tokens (e.g., exact email or external ID) and composite tokens (e.g., LastName + PostalCode) when single attributes are too broad.

Do not rely on defaults. Map a specific token class for every attribute you tokenize. If an attribute should not produce tokens, set ignoreInToken to suppress token generation.

Use phonetic tokens for names, address‑specific tokens for addresses, etc. Avoid using fuzzy match token classes unless necessary—they can generate numerous tokens and slow down candidate retrieval.

Be cautious with high‑fan‑out classes (e.g.,). They can inflate token counts and reduce performance. When using , pair with <another component> where appropriate, and analyze collision rates to verify.

Configure rules to only use OV values. This usually maintains match quality while enhancing runtime performance compared to considering non-OV values.

Attribute-level

Example:

Exact first name (and/or last name):
Comparator: BasicStringComparator
Token class: ExactMatchToken
Operator: Exact
Use this when you want exact equality.
Organization name (multi-word, noisy):
Comparator: DistinctWords (often via a custom wrapper)
Token class: DistinctWords (matching the comparator)
Enhancements: noise-word dictionary, Stemmer, Soundex, if justified by your data.

Selection Example

Match rule tactic	Example attribute(s)	Typical ComparatorClass to use	Why
Automatic — Exact email	`Emails/Address`	BasicStringComparator	Exact string tactics using BasicStringComparator with ExactMatchToken for exact matching use-cases; the same guidance applies to exact email.
Automatic — Exact SSN	`Identifiers`	BasicStringComparator	For strict equality on identifiers, use Exact with BasicStringComparator and ExactMatchToken; Reltio emphasizes aligning exact operators with BasicString.
Relevance — Fuzzy first & last name (edit-distance)	`FirstName`, `LastName`	DamerauLevenshteinDistance	Map `FuzzyTextMatchToken` on First/Last Name to DamerauLevenshteinDistance for misspellings. FuzzyText can generate many tokens, performance should be monitored
Relevance— Phonetic first & last name	`FirstName`, `LastName`	DoubleMetaphoneComparator	DoubleMetaphoneComparator paired with DoubleMetaphoneMatchToken as a phonetic tactic for names (nicknames/sound-alikes).
Relevance — Tolerant street address	`Address` `AddressLine1`	AddressLineComparator	Use AddressLineMatchToken with AddressLineComparator for address normalization (drops numbers/garbage words, sorts remaining terms). Purpose-built for address tolerance.
Relevance — Exact phone + fuzzy name	`Phones` `Number` (exact) names (fuzzy)	PhoneNumberComparator (phone) + DamerauLevenshteinDistance or DoubleMetaphoneComparator (name)	Combine exact phone with fuzzy name; Use PhoneNumberComparator with PhoneNumberMatchToken supports `noiseDictionary` directly.
Proximity fuzzy — Geo-nearby	`Location/Geo` (lat/long)	(Geo proximity comparator, per rule)	Use ProximateGeoToken for radius-based geo tokenization in proximity matching. Use for “nearby” logic.

Comparator class (exact name)	Best for this data	Why/when to use it (from Reltio docs)
BasicStringComparator	Exact/standardized strings (IDs, codes, normalized email, gender, suffix)	Performs literal equality; recommended for “Exact” tactics on many attributes, and returns 1 for identical values, 0 otherwise in relevance-based rules.
DamerauLevenshteinDistance	Strings with typos/misspellings (first/last names, org names)	Edit-distance comparator for transpositions, insertions, deletions, substitutions; used in Reltio’s fuzzy name examples with `FuzzyTextMatchToken`.
DynamicDamerauLevenshteinDistance	Like above, with dynamic tolerance on longer strings	Same edit-distance family, with dynamic behavior.
DoubleMetaphoneComparator	Phonetic person names (sound-alikes)	Double Metaphone + Name Dictionary to handle spelling/nickname variants in first names.
SoundexComparator	Phonetic similarity (names)	Use Soundex as a phonetic alternative; used in examples (e.g., fuzzy first/last in suspect rules).
DistinctWordsComparator	Multi-word (bag-of-words) text, especially organization names	Basis for org-name custom strategies (with stemming/Soundex/noise dictionary). When used, use ignoring it for tokenization (use `ignoreInToken`) to avoid token explosion.
BasicTokenizedOrganizationNameComparator	Organization names	Has built-in noise-word removal (e.g., “Inc”, “LLC”) to reduce clutter before comparison.
AddressLineComparator	Street AddressLine1	Purpose-built; includes an in-built noise-word list (e.g., “St”, “Ave”) to normalize addresses for tolerant matching.
PhoneNumberComparator	Phone numbers across formats	Normalizes digits; comparator & token both support a `noiseDictionary` so you don’t need a separate cleanser; matching logic requires sufficient digits.
RangeNumericComparator	Numeric ranges (e.g., ZIP5 within ±N)	Lets you set a threshold range (example: Zip5 within ±2) for tolerant numeric comparison; A base classes with settable parameters when wrapped.
CustomComparator (wrapper)	When you need to extend a base class (add stemming, Soundex, noise dictionaries, parameters)	Official guidance: wrap an out-of-the-box base comparator to add capabilities; keep comparator/token symmetric with the custom token class.

Tip: For each attribute, choose the comparator that matches its semantics (e.g., exact string vs. bag-of-words for names) and pick a token class that mirrors it. Use cleaners/normalizers first so the comparator gets consistent inputs.

Build the comparison formula last, after tokens & comparators

Reltio rules use Boolean or arithmetic formulas, depending on rule type (automatic/suspect/custom or relevance-based). Keep rules minimal, non-overlapping, and focused—excess rules or redundant tactics slow the engine.

Test & iterate safely

Use the Verify Matches API to simulate how two entities would match (with your default rules, a specific rule, or even an ad-hoc rule body). This is ideal for tuning before you retokenize.
Use Match Rule Analyzer and the Tokenization Schemes API to inspect the actual tokens. Identify collisions or excessive token counts and adjust. Consolidate similar schemes to reduce the total number without hurting match quality.
When you must retokenize, use RebuildMatchTableTask with query filters to limit scope; full retokenization is resource-intensive on large datasets.

Practical do’s & don’ts

Start with precise, high-confidence rules (e.g., exact IDs or emails) using exact tokens; then layer fuzzier rules where necessary.
Keep token phrases selective; combine attributes when individual ones are too common; watch the 300-profile cap.
Align token+comparator (symmetry), and introduce phonetic/stemming/noise dictionaries only when supported by your data quality analysis.
Do not design rules before profiling; add overlapping rules; or rely on low-cardinality tokens alone.

Reference

https://docs.reltio.com/en/reltio/what-does-reltio-do/what-reltio-does-at-a-glance/data-unification-and-mdm-at-a-glance/data-unification-and-mdm-in-detail/reltio-match-and-merge/match-group-elements---description-and-configuration/rule-element/match-token-class/match-token-classes
https://docs.reltio.com/en/reltio/what-does-reltio-do/what-reltio-does-at-a-glance/data-unification-and-mdm-at-a-glance/data-unification-and-mdm-in-detail/reltio-match-and-merge/match-group-elements---description-and-configuration/rule-element/comparator-classes
https://docs.reltio.com/en/reltio/what-does-reltio-do/what-reltio-does-at-a-glance/data-unification-and-mdm-at-a-glance/data-unification-and-mdm-in-detail/reltio-match-and-merge/match-strategies-for-the-most-common-attributes
https://support.reltio.com/hc/en-us/articles/4408759578125-How-can-I-use-the-Match-Analyzer-to-find-simple-match-token-issues#how-can-i-use-the-match-analyzer-to-find-simple-match-token-issues-0
https://docs.reltio.com/en/objectives/resolve-potential-matches/potential-matching-at-a-glance/potential-matching-navigation/configure-match-rules-overview/create-initial-match-rules/illustrative-examples-of-various-match-rules/example-3---four-rules-working-together/example-3---individual-entity-type-rule-4
https://docs.reltio.com/en/objectives/resolve-potential-matches/potential-matching-at-a-glance/potential-matching-navigation/configure-match-rules-overview/create-initial-match-rules/design-your-match-tokenization-scheme
https://docs.reltio.com/en/objectives/resolve-potential-matches/potential-matching-at-a-glance/potential-matching-navigation/configure-match-rules-overview/create-initial-match-rules/illustrative-examples-of-various-match-rules/example-1---simple-rule-with-reltio-name-dictionary
https://support.reltio.com/hc/en-us/articles/360049585051-Why-is-my-addressLine1-custom-match-rule-for-fuzzy-logic-is-not-working-as-expected
https://docs.reltio.com/en/reltio/what-does-reltio-do/what-reltio-does-at-a-glance/data-unification-and-mdm-at-a-glance/data-unification-and-mdm-in-detail/reltio-match-and-merge/match-group-elements---description-and-configuration/rule-element/match-token-class/match-token-generation
https://support.reltio.com/hc/en-us/articles/360028928292-What-is-proximity-fuzzy-matching-and-how-can-I-enable-it
https://docs.reltio.com/en/reltio/what-does-reltio-do/what-reltio-does-at-a-glance/data-unification-and-mdm-at-a-glance/data-unification-and-mdm-in-detail/reltio-match-and-merge/relevance-based-matching---detailed-explanation
https://docs.reltio.com/en/objectives/resolve-potential-matches/potential-matching-at-a-glance/potential-matching-navigation/configure-match-rules-overview/create-initial-match-rules/illustrative-examples-of-various-match-rules/example-2---using-a-custom-comparator-and-token-class
https://docs.reltio.com/en/reltio/what-does-reltio-do/what-reltio-does-at-a-glance/data-unification-and-mdm-at-a-glance/data-unification-and-mdm-in-detail/reltio-match-and-merge/match-group-elements---description-and-configuration/rule-element/ignoreintoken