Differences between revisions 2 and 3
Revision 2 as of 2013-11-14 18:25:51
Size: 6897
Comment:
Revision 3 as of 2014-01-08 15:51:41
Size: 6913
Comment:
Deletions are marked like this. Additions are marked like this.
Line 22: Line 22:
 * '''''--kill-running-server''''' Don't kill accidentally a running server on starting.  * '''''--kill-running-server''''' If there is a running server, do kill it before starting the new one.

CompleteSearch Completion Server

startCompletionServer [options] <base-name>.hybrid

This starts the CompletionServer.

There are several options available, which should provide any

  • --zero-fork Run the server in the foreground, and output everything to the console, which is convenient for testing. The default mode is to run as a background process and write all output to a log file.

  • --port <port number> Specify the port, where the server is listening (default is 8888).

  • --kill <port number> Stop the server running at the specified port.

  • --log-file <logfile> Specify file name for the log messages (default is <base-name>.log).

  • --verbosity <verbosity level> Set the log verbosity, especially for debugging (1 = normal, 2 = high, 3 = highest; default is 1).

  • --document-root <path on filesystem> Allows to request e.g. HTML pages located under the given path by requesting <host>:<port>/<someHTML>. Per default this feature is disabled.

  • --query-timeout <timeout> Specify a timeout a request is allowed to be processed to prevent critical queries from bringing the server to a standstill (default is 5000 ms).

  • --word-part-separator-frontend <character> Specify the separator, which is used in the api to request special queries like :facet:year:1993 (default is ':').

  • --maps-directory <dir> Specify the directory containing the maps utf8.map and iso8859-1.map (default is the execution directory).

  • --use-suffix-for-exact-query Allows to find müller, if normalization is enabled. Otherwise it's necessary to look for müller:*, instead of müller.

  • --disable-cdata-tags It's recommended to use this option, if the info field for each document is valid xml and if invalid xml is already escaped using cdata. Otherwise your whole output will be escaped by using cdata.

  • --kill-running-server If there is a running server, do kill it before starting the new one.

Some of the options depend on your index and don't work, if the index files itself don't support them. It's recommended to use the generic CsvParser with the same options to create proper index files.

  • Use --normalize-words to normalize all non-facet words. This allows to find Müller, even if muller is requested. It's recommended to also set the option --use-suffix-for-exact-query.

  • It's possible provide different outputs (info fields) for one document by using --info-delimiter <info-delimiter>. This can be reasonable, if you want to return different columns (e.g. <document-as-xml> and <document-as-html>) in different situations. It's possible to request the different outputs by using the query parameter p=<pos>, whereas pos defines if it's the first of the given outputs (p=0), the second outputs (p=1), etc.

  • We used to separate the words in special words like :facet:year:* by using the colon. We noticed, that the colon is positioned between numbers and letters in the ascii code, which might lead to problems on reading word ranges from the words file. There should not occur any problems, but it's still recommended to use a character which is positioned in front of numbers, like '!' (the default now). The separator can be specified by using word-part-separator-backend <character>.

  • To enable fuzzy search, use enable-fuzzy-search. This allows to find e.g. algorithm even by requesting the wrong written algoritm~ (the tilde is essential).

Existing options, which are not yet explained in depth, but copied from the source code.

  • --auto-restart Automatically restart the server if it crashes (requires double fork mode).

  • --no-statistics (-V)

  • --index-type [INV|HYB] Type of index (default: guess from index file name).

  • --no-double-fork No double fork, process will run forever or until server killed.

  • --multi-threaded Run in multithreaded mode (default: process one query after the other; still recommended).

  • --how-to-rank-docs <rankingType> Specify how to rank documents (0 = by score, 1 = by doc id, 2 = by word id followed by a = ascending or d = descending, default os 0d).

  • --how-to-rank-words <rankingType> Specify how to rank words (0 = by score, 1 = by doc count, 2 = by occ count, 3 = by word id, followed by a = ascending or d = descending, default is 0d).

  • --score-aggregations <aggregation> Specify score aggregation by a 4-letter string over the alphabet {S,M,B}, see explanations below.

    • There are currently three types of score aggregation, S = sum, M = max, B = sum with bonus for proximity and exact word match. There are two aggregations for doc scores (same completion, different completion) and two aggregations for word scores (same doc, different doc)-
  • --pid-file <file name> Specifiy name of file containing the process id, leading ~ will be replaced by home dir, first %s will be replaced by host name, second %s will be replaced by port (default is ~/.completesearch_<hostname>_<port>).

  • --exe-command <command>

  • --locale <encoding> Set LC_ALL to this string, irrespective of special "!encoding:..." word in index.

  • --enable-synonym-search (-S)

  • --fuzzy-normalize-words (-W)

  • --show-query-result (-Q)

  • --use-generalized-edit-distance-slow Use generalized edit distance to rank the word-ids (slow!).

  • --use-baseline-fuzzysearch (-B)

  • --cleanup-query-before-processing Cleanup query before processing.

  • --read-custom-scores (-0)

  • Cache/history sizes must be greater than 0 and are given in one of the form: n meaning n bytes, nK meaning n kilobytes, nM meaning n megabytes, nG meaning n gigabytes.
    • --cache-size-excerpts <size> Sets the cache size for the excerpts generator (default: 16 megabytes).

    • --max-size-history <size> Set the history size (default: 32 megabytes).

    • --max-queries-history <size> At most that many queries in history (default: 200; note: current impl. is quadratic).

  • -E On error, send single hit with error message (will be seen in browser then).

  • -e <docs file> Name of file containing excerpts info (default: <db>.docs.DB).

  • -T Do not turn title from the docs t: field into link, but send it verbatim.

For more details, a look at the code that processes these command line options might be helpful. You can find the code in file https://ad-websvn.informatik.uni-freiburg.de/completesearch/codebase/server/StartCompletionServer.cpp.

CompleteSearch: CompletionServer (last edited 2016-07-15 15:44:26 by Hannah Bast)