CompleteSearch Completion Server
startCompletionServer [options] <base-name>.hybrid
This starts the CompletionServer.
There are several options available, which should provide most of the necessary functionality.
Explicit Server Options
--zero-fork Run the server in the foreground, and output everything to the console, which is convenient for testing. The default mode is to run as a background process and write all output to a log file.
--no-double-fork Single fork, process will run forever or until server killed.
--multi-threaded Run in multithreaded mode (default: process one query after the other; still recommended).
--auto-restart Automatically restart the server, if it crashes (requires double fork mode, which is default).
--kill <port number> Stop the server running at the specified port.
--kill-running-server If there is a running server, do kill it before starting the new one.
--port <port number> Specify the port, where the server is listening (default is 8888).
--pid-file <file name> Specifiy name of file containing the process id. Leading ~ will be replaced by the home directory, first %s will be replaced by host name, second %s will be replaced by port (default is ~/.completesearch_<hostname>_<port>).
--locale <encoding> Set LC_ALL to this string, irrespective of special "!encoding:..." word in index.
--maps-directory <dir> Specify the directory containing the maps utf8.map and iso8859-1.map (default is the execution directory).
--index-type [INV|HYB] Type of index (default: guess from index file name).
-e <docs file> Name of file containing excerpts info (default: <db>.docs.DB).
Query Processing Options
--normalize-words Normalize all non-facet words. This allows to find Müller, even if muller is requested. It's recommended to also set the option --use-suffix-for-exact-query. Take care, to achieve the intended behaviour, it's necessary to parse with the same option. See also CsvParser.
--word-part-separator-backend <character> We used to separate the words in special words like :facet:year:* by using the colon. We noticed, that the colon is positioned between numbers and letters in the ascii code, which might lead to problems on reading word ranges from the words file. There should not occur any problems, but it's still recommended to use a character which is positioned in front of numbers, like '!' (the default now). It's necessary to build the words file with the same delimuter. See also CsvParser.
--query-timeout <timeout> Specify a timeout a request is allowed to be processed to prevent critical queries from bringing the server to a standstill (default is 5000 ms).
--word-part-separator-frontend <character> Specify the separator, which is used in the api to request special queries like :facet:year:1993 (default is ':').
--use-suffix-for-exact-query Allows to find müller, if normalization is enabled. Otherwise it's necessary to look for müller:*, instead of müller.
--disable-cdata-tags It's recommended to use this option, if the info field for each document is valid xml and if invalid xml is already escaped using cdata. Otherwise your whole output will be escaped by using cdata.
-E On error the error message is appended to the response and sent to the client.
--document-root <path on filesystem> Allows to request e.g. HTML pages located under the given path by requesting <host>:<port>/<someHTML>. Per default this feature is disabled.
--exe-command <command> If specified, the usage of the query parameter exe=<someValue> leads to the execution of the command <command><someValue>.
Cache/history sizes must be greater than 0 and are given in one of the form: n meaning n bytes, nK meaning n kilobytes, nM meaning n megabytes, nG meaning n gigabytes.
--max-size-history <size> Set the history size (default: 32 megabytes).
--max-queries-history <size> At most that many queries in history (default: 200; note: current impl. is quadratic).
--cache-size-excerpts <size> Sets the cache size for the excerpts generator (default: 16 megabytes).
--cleanup-query-before-processing Cleanup query before processing by correcting the order of the letters ^, * and ~ and erasing multiple interpretable characters like #, . and *.
--how-to-rank-docs <rankingType> Specify how to rank documents (0 = by score, 1 = by doc id, 2 = by word id followed by a = ascending or d = descending, default os 0d).
--how-to-rank-words <rankingType> Specify how to rank words (0 = by score, 1 = by doc count, 2 = by occ count, 3 = by word id, followed by a = ascending or d = descending, default is 0d).
--score-aggregations <aggregation> Specify score aggregation by a 4-letter string over the alphabet {S,M,B}, see explanations below.
- There are currently three types of score aggregation, S = sum, M = max, B = sum with bonus for proximity and exact word match. There are two aggregations for doc scores (same completion, different completion) and two aggregations for word scores (same doc, different doc).
Logging Options
--log-file <logfile> Specify file name for the log messages (default is <base-name>.log).
--show-query-result Log information about the query result.
--verbosity <verbosity level> Set the log verbosity, especially for debugging (1 = normal, 2 = high, 3 = highest; default is 1).
--no-statistics Don't write time statistics to the log file.
Existing options, which are not yet explained in depth, but copied from the source code.
--use-generalized-edit-distance-slow Use generalized edit distance to rank the word-ids (slow!).
--read-custom-scores (-0)
To enable synonym search, use enable-synonym-search.
To enable fuzzy search, use enable-fuzzy-search. This allows to find e.g. algorithm even by requesting the wrong written algoritm~ (the tilde is essential).
--fuzzy-normalize-words (-W)
--use-baseline-fuzzysearch (-B)
For more details, a look at the code that processes these command line options might be helpful. You can find the code in file https://ad-svn.informatik.uni-freiburg.de/completesearch/codebase/server/StartCompletionServer.cpp.
* It's possible provide different outputs (info fields) for one document by using --info-delimiter <info-delimiter>. This can be reasonable, if you want to return different columns (e.g. <document-as-xml> and <document-as-html>) in different situations. It's possible to request the different outputs by using the query parameter p=<pos>, whereas pos defines if it's the first of the given outputs (p=0), the second outputs (p=1), etc.