199
Comment: Initial description
|
1101
Protected word definition
|
Deletions are marked like this. | Additions are marked like this. |
Line 3: | Line 3: |
'''A''' * '''Analyzer''': Analyzers are components that preprocess input text at index time and/or at search time. It's important to use the same or similar analyzers that process text in a compatible manner at index and query time. For example, if an indexing analyzer lowercases words, then the query analyzer should do the same to enable finding the indexed words. |
|
Line 4: | Line 8: |
* '''Full text search''': | |
Line 9: | Line 15: |
* '''Protected word''': | * '''Protected word''': A word that is not modified by any stemming transformation. |
Line 13: | Line 19: |
* '''Stemming''': A transforming algorithm that reduces any of the forms of a word such as "runs, running, ran" to its elemental root ("run") or that does the inverse, that is, it takes a root word and expands it to all of its various form. |
|
Line 14: | Line 22: |
'''T''' * '''Token''': An analyzer splits up an input text into a series of tokens. A token is a substring of the input text that is indexed or queried for and not split any further. |
This page gives a glossary of the most important terms in the search engine nomenclature.
A
Analyzer: Analyzers are components that preprocess input text at index time and/or at search time. It's important to use the same or similar analyzers that process text in a compatible manner at index and query time. For example, if an indexing analyzer lowercases words, then the query analyzer should do the same to enable finding the indexed words.
F
Full text search:
Free text:
P
Protected word: A word that is not modified by any stemming transformation.
S
Stemming: A transforming algorithm that reduces any of the forms of a word such as "runs, running, ran" to its elemental root ("run") or that does the inverse, that is, it takes a root word and expands it to all of its various form.
Stop word:
T
Token: An analyzer splits up an input text into a series of tokens. A token is a substring of the input text that is indexed or queried for and not split any further.