Practical Guide for choosing and testing Token strategies and comparison classes

Start with data profiling.

  • Profile completeness, formats, cardinality, and noise words for each attribute you plan to match. This is the first step Reltio recommends before you design any comparison formula or tokens. 

 

Pair your comparator class with a matching token class (“keep them symmetric”)

  • Reltio’s guidance and examples stress aligning the comparator and token generator (e.g., DistinctWords comparator with a DistinctWords token). If you wrap a base class, keep the pair symmetric and add extras like noise word removal, stemming, or Soundex only if necessary. 

  • Use Reltio’s catalog of comparator classes and their relevance behaviors to pick the proper comparator per attribute. (Reltio documents how each comparator behaves and how its relevance can be computed.) 

Identifying the Correct Comparator for a Match Rule

Define the comparator for each attribute rather than relying on system defaults—especially for fuzzy logic—to ensure predictable, optimal behavior.

Use strict comparators (e.g., ExactMatchComparator) for IDs, codes, and other fields requiring exact equality. Use fuzzy comparators (e.g., DamerauLevenshteinDistance, SoundexComparator) for attributes with natural variation (names, addresses). 

Match rules can combine comparators with logical operators (AND/OR). Ensure the logic aligns with acceptable levels of false positives and negatives for your use case.

 

Identifying the Correct Tokenization for a Match Rule

Tokens exist to narrow the candidate set that your rule compares. Design them so that rules retrieve the correct records without overwhelming the match engine. 

 A single token phrase may link to a maximum of 300 profiles. Broad/low-cardinality tokens (e.g., country alone) or overly fuzzy tokenization can significantly increase the number of candidates and degrade performance—avoid or combine them with other attributes to maintain selectivity. 

Prefer high-cardinality, stable attributes for tokens (e.g., exact email or external ID) and composite tokens (e.g., LastName + PostalCode) when single attributes are too broad. 

Do not rely on defaults. Map a specific token class for every attribute you tokenize. If an attribute should not produce tokens, set ignoreInToken to suppress token generation.

Use phonetic tokens for names, address‑specific tokens for addresses, etc. Avoid using fuzzy match token classes unless necessary—they can generate numerous tokens and slow down candidate retrieval.

Be cautious with high‑fan‑out classes (e.g.,). They can inflate token counts and reduce performance. When using , pair with <another component> where appropriate, and analyze collision rates to verify.

Configure rules to only use OV values. This usually maintains match quality while enhancing runtime performance compared to considering non-OV values.

Attribute-level

Example:

  • Exact first name (and/or last name):
    Comparator: BasicStringComparator
    Token class: ExactMatchToken
    Operator: Exact
    Use this when you want exact equality.

  • Organization name (multi-word, noisy):
    Comparator: DistinctWords (often via a custom wrapper)
    Token class: DistinctWords (matching the comparator)
    Enhancements: noise-word dictionary, Stemmer, Soundex, if justified by your data. 

Selection Example

Match rule tactic Example attribute(s) Typical ComparatorClass to use Why 
Automatic — Exact email Emails/Address BasicStringComparator Exact string tactics using BasicStringComparator with ExactMatchToken for exact matching use-cases; the same guidance applies to exact email. 
Automatic — Exact SSN  Identifiers BasicStringComparator For strict equality on identifiers, use Exact with BasicStringComparator and ExactMatchToken; Reltio emphasizes aligning exact operators with BasicString. 
Relevance — Fuzzy first & last name (edit-distance) FirstName, LastName DamerauLevenshteinDistance Map FuzzyTextMatchToken on First/Last Name to DamerauLevenshteinDistance for misspellings.  FuzzyText can generate many tokens, performance should be monitored
Relevance— Phonetic first & last name FirstName, LastName DoubleMetaphoneComparator  DoubleMetaphoneComparator paired with DoubleMetaphoneMatchToken as a phonetic tactic for names (nicknames/sound-alikes). 
Relevance — Tolerant street address

Address

AddressLine1

AddressLineComparator Use AddressLineMatchToken with AddressLineComparator for address normalization (drops numbers/garbage words, sorts remaining terms). Purpose-built for address tolerance. 
Relevance — Exact phone + fuzzy name

Phones

Number (exact) names (fuzzy)

PhoneNumberComparator (phone) + DamerauLevenshteinDistance or DoubleMetaphoneComparator (name) Combine exact phone with fuzzy name; Use PhoneNumberComparator with PhoneNumberMatchToken supports noiseDictionary directly. 
Proximity fuzzy — Geo-nearby Location/Geo (lat/long) (Geo proximity comparator, per rule) Use ProximateGeoToken for radius-based geo tokenization in proximity matching. Use for “nearby” logic. 
Comparator class (exact name) Best for this data Why/when to use it (from Reltio docs)
BasicStringComparator Exact/standardized strings (IDs, codes, normalized email, gender, suffix) Performs literal equality; recommended for “Exact” tactics on many attributes, and returns 1 for identical values, 0 otherwise in relevance-based rules. 
DamerauLevenshteinDistance Strings with typos/misspellings (first/last names, org names) Edit-distance comparator for transpositions, insertions, deletions, substitutions; used in Reltio’s fuzzy name examples with FuzzyTextMatchToken
DynamicDamerauLevenshteinDistance Like above, with dynamic tolerance on longer strings Same edit-distance family, with dynamic behavior.
DoubleMetaphoneComparator Phonetic person names (sound-alikes) Double Metaphone + Name Dictionary to handle spelling/nickname variants in first names. 
SoundexComparator Phonetic similarity (names) Use Soundex as a phonetic alternative; used in examples (e.g., fuzzy first/last in suspect rules). 
DistinctWordsComparator Multi-word (bag-of-words) text, especially organization names Basis for org-name custom strategies (with stemming/Soundex/noise dictionary). When used, use ignoring it for tokenization (use ignoreInToken) to avoid token explosion. 
BasicTokenizedOrganizationNameComparator Organization names Has built-in noise-word removal (e.g., “Inc”, “LLC”) to reduce clutter before comparison. 
AddressLineComparator Street AddressLine1 Purpose-built; includes an in-built noise-word list (e.g., “St”, “Ave”) to normalize addresses for tolerant matching. 
PhoneNumberComparator Phone numbers across formats Normalizes digits; comparator & token both support a noiseDictionary so you don’t need a separate cleanser; matching logic requires sufficient digits. 
RangeNumericComparator Numeric ranges (e.g., ZIP5 within ±N) Lets you set a threshold range (example: Zip5 within ±2) for tolerant numeric comparison; A base classes with settable parameters when wrapped. 
CustomComparator (wrapper) When you need to extend a base class (add stemming, Soundex, noise dictionaries, parameters) Official guidance: wrap an out-of-the-box base comparator to add capabilities; keep comparator/token symmetric with the custom token class. 

 

Tip: For each attribute, choose the comparator that matches its semantics (e.g., exact string vs. bag-of-words for names) and pick a token class that mirrors it. Use cleaners/normalizers first so the comparator gets consistent inputs. 

 

Build the comparison formula last, after tokens & comparators

  • Reltio rules use Boolean or arithmetic formulas, depending on rule type (automatic/suspect/custom or relevance-based). Keep rules minimal, non-overlapping, and focused—excess rules or redundant tactics slow the engine. 

Test & iterate safely

  • Use the Verify Matches API to simulate how two entities would match (with your default rules, a specific rule, or even an ad-hoc rule body). This is ideal for tuning before you retokenize.

  • Use Match Rule Analyzer and the Tokenization Schemes API to inspect the actual tokens. Identify collisions or excessive token counts and adjust. Consolidate similar schemes to reduce the total number without hurting match quality.

  • When you must retokenize, use RebuildMatchTableTask with query filters to limit scope; full retokenization is resource-intensive on large datasets. 

Practical do’s & don’ts

  • Start with precise, high-confidence rules (e.g., exact IDs or emails) using exact tokens; then layer fuzzier rules where necessary. 

  • Keep token phrases selective; combine attributes when individual ones are too common; watch the 300-profile cap. 

  • Align token+comparator (symmetry), and introduce phonetic/stemming/noise dictionaries only when supported by your data quality analysis.

  • Do not design rules before profiling; add overlapping rules; or rely on low-cardinality tokens alone. 

 

Reference

 

Was this article helpful?
0 out of 0 found this helpful

Comments

0 comments

Please sign in to leave a comment.