Differences between revisions 1 and 2
Revision 1 as of 2017-03-19 13:27:03
Size: 1296
Editor: Hannah Bast
Comment:
Revision 2 as of 2017-03-19 22:23:54
Size: 1577
Editor: Hannah Bast
Comment:
Deletions are marked like this. Additions are marked like this.
Line 19: Line 19:

The code for the precomputation of the word id map is in Vocabulary::precomputeWordIdMap(). The call is triggered in file HYBIndex.cpp as follows:

{{{
681 if (fuzzySearchEnabled || synonymSearchEnabled || normalizeWords)
682 _vocabulary.precomputeWordIdMap();
}}}

CompleteSearch Details

Word Id Map

The word id map is applied at the end of the core query processing (which produces raw posting lists), just before the aggregation for the output happens. Consider the following fuzzy search example. Without the word id map, a raw result posting list for a completion query for prob* might have the following property:

probability ... 25 occurrences
probabilistic ... 23 occurrences
probably ... 12 occurrences.
probalistic:probabilistic ... 4 occurrences

Without the word id map, the final result would contain just that information. If only the top-3 completions are returned, it would contain only the counts for probability (25), probabilistic (23), and probably (12).

With a word id map that maps the word id of probalistic:probabilistic to the word id of probabilistic, the top-3 completions returned would be probabilistic (27), probability (25), probably (12).

There are no restrictions on the word id map, that is, it can be used to map any word id to any other word id. TODO: explain the use cases that are currently implemented and how to enhance that implementation. Does it require fiddling the code or is there a command line argument to pass an arbitrary word id map.

The code for the precomputation of the word id map is in Vocabulary::precomputeWordIdMap(). The call is triggered in file HYBIndex.cpp as follows:

681    if (fuzzySearchEnabled || synonymSearchEnabled || normalizeWords)
682      _vocabulary.precomputeWordIdMap();

CompleteSearch: Details (last edited 2017-03-19 22:23:54 by Hannah Bast)