Word Id Map
The word id map is applied at the end of the core query processing (which produces raw posting lists), just before the aggregation for the output happens. Consider the following fuzzy search example. Without the word id map, a raw result posting list for a completion query for prob* might have the following property:
probability ... 25 occurrences probabilistic ... 23 occurrences probably ... 12 occurrences. probalistic:probabilistic ... 4 occurrences
Without the word id map, the final result would contain just that information. If only the top-3 completions are returned, it would contain only the counts for probability (25), probabilistic (23), and probably (12).
With a word id map that maps the word id of probalistic:probabilistic to the word id of probabilistic, the top-3 completions returned would be probabilistic (27), probability (25), probably (12).
There are no restrictions on the word id map, that is, it can be used to map any word id to any other word id. TODO: explain the use cases that are currently implemented and how to enhance that implementation. Does it require fiddling the code or is there a command line argument to pass an arbitrary word id map.
The code for the precomputation of the word id map is in Vocabulary::precomputeWordIdMap(). The call is triggered in file HYBIndex.cpp as follows:
681 if (fuzzySearchEnabled || synonymSearchEnabled || normalizeWords) 682 _vocabulary.precomputeWordIdMap();