Differences between revisions 4 and 6 (spanning 2 versions)
Revision 4 as of 2008-03-10 16:59:45
Size: 475
Editor: mpiat1403
Comment: Stemming definition
Revision 6 as of 2008-03-10 17:11:52
Size: 1042
Editor: mpiat1403
Comment: Token
Deletions are marked like this. Additions are marked like this.
Line 3: Line 3:
'''A'''

 * '''Analyzers''': Analyzers are components that preprocess input text at index time and/or at search time. It's important to use the same or similar analyzers that process text in a compatible manner at index and query time. For example, if an indexing analyzer lowercases words, then the query analyzer should do the same to enable finding the indexed words.
Line 4: Line 8:
   * '''Full text search''':
Line 6: Line 12:
   * '''Full text search''':
Line 15: Line 19:
 * '''Stemming''': A transforming algorithm that reduces any of the forms of a word such as "runs, running, ran" to its elemental root ("run") or that does the inverse, that is, it takes a root word and expands it to all of its various form.
Line 17: Line 23:
 * '''Stemming''': A transforming algorithm that reduces any of the forms of a word such as "runs, running, ran" to its elemental root ("run") or that does the inverse, that is, it takes a root word and expands it to all of its various form. '''T'''

 * '''Token''': An analyzer splits up an input text into a series of tokens. A token is a substring of the input text that is indexed or queried for and not split any further.

This page gives a glossary of the most important terms in the search engine nomenclature.

A

  • Analyzers: Analyzers are components that preprocess input text at index time and/or at search time. It's important to use the same or similar analyzers that process text in a compatible manner at index and query time. For example, if an indexing analyzer lowercases words, then the query analyzer should do the same to enable finding the indexed words.

F

  • Full text search:

  • Free text:

P

  • Protected word:

S

  • Stemming: A transforming algorithm that reduces any of the forms of a word such as "runs, running, ran" to its elemental root ("run") or that does the inverse, that is, it takes a root word and expands it to all of its various form.

  • Stop word:

T

  • Token: An analyzer splits up an input text into a series of tokens. A token is a substring of the input text that is indexed or queried for and not split any further.

CompleteSearch: completesearch/Glossary (last edited 2008-09-29 15:49:39 by mpiat1403)