Differences between revisions 11 and 12
Revision 11 as of 2008-03-11 10:15:51
Size: 1345
Editor: mpiat1403
Comment: Facte search is ?
Revision 12 as of 2008-09-29 15:49:39
Size: 2198
Editor: mpiat1403
Comment: Variable byte encoding
Deletions are marked like this. Additions are marked like this.
Line 28: Line 28:


'''V''':

 * '''Variable-byte encoding''' ('''vbyte'''): Variable-byte encoding (or vbyte encoding) is an integer compression scheme where smaller numbers occupy smaller numbers of bytes than larger numbers. Each byte in a file in this scheme contains one stop bit, and seven bits of information about the number. If the stop bit indicates that the number continues to the next byte, the next byte is also considered when reconstructing the number, and so on. The other seven bits of the byte are the bits from the number. In this fashion, numbers smaller than 128 can be represented in one byte, numbers smaller than 32768 can be represented in two bytes, and so on. Vbyte encoding is not the most compact integer compression scheme. However, it has been shown to be extremely fast when decoding numbers, which increases query performance.

 

This page gives a glossary of the most important terms in the search engine nomenclature.

A

  • Analyzer: Analyzers are components that preprocess input text at index time and/or at search time. It's important to use the same or similar analyzers that process text in a compatible manner at index and query time. For example, if an indexing analyzer lowercases words, then the query analyzer should do the same to enable finding the indexed words.

F

  • Facet search:

  • Full text search: In a full text search, the search engine examines all of the words in every stored document as it tries to match search words supplied by the user.

  • Free text:

P

  • Protected word: A word that is not modified by any stemming transformation.

S

  • Stemming: A transforming algorithm that reduces any of the forms of a word such as "runs, running, ran" to its elemental root ("run") or that does the inverse, that is, it takes a root word and expands it to all of its various form.

  • Stop word: A word that is discarded as a token in indexing and querying.

T

  • Token: An analyzer splits up an input text into a series of tokens. A token is a substring of the input text that is indexed or queried for and not split any further.

V:

  • Variable-byte encoding (vbyte): Variable-byte encoding (or vbyte encoding) is an integer compression scheme where smaller numbers occupy smaller numbers of bytes than larger numbers. Each byte in a file in this scheme contains one stop bit, and seven bits of information about the number. If the stop bit indicates that the number continues to the next byte, the next byte is also considered when reconstructing the number, and so on. The other seven bits of the byte are the bits from the number. In this fashion, numbers smaller than 128 can be represented in one byte, numbers smaller than 32768 can be represented in two bytes, and so on. Vbyte encoding is not the most compact integer compression scheme. However, it has been shown to be extremely fast when decoding numbers, which increases query performance.

CompleteSearch: completesearch/Glossary (last edited 2008-09-29 15:49:39 by mpiat1403)