2197
Comment:
|
7511
|
Deletions are marked like this. | Additions are marked like this. |
Line 1: | Line 1: |
## page was renamed from Aqqu | #acl Niklas Schnelle:read,write All:read |
Line 4: | Line 5: |
<<TableOfContents(3)>> == Description == Question answering from Freebase as described in the [[http://ad-publications.informatik.uni-freiburg.de/CIKM_freebase_qa_BH_2015.pdf|CIKM 2015]] publication. The code below also contains some improvements (neural network, performance) that came after the publication. The public code also contains a README that describes how to download, install, train and run the system. Below describes how to setup the demo (for which the code is not public). == Code == |
|
Line 10: | Line 19: |
== Aqqu instance == |
== Demo == === Aqqu instance === |
Line 18: | Line 28: |
ssh metropolis | |
Line 20: | Line 31: |
source activate aqqu PYTHONPATH=$(pwd):$PYTHONPATH python webserver/translation_webserver.py }}} == Virtuoso instance == |
source venv/bin/activate # was: activate aqqu PYTHONPATH=$(pwd):$PYTHONPATH python webserver/translation_webserver.py }}} === Virtuoso instance (via Elmars home) === |
Line 29: | Line 39: |
Start it as follows (kill old process manually): | Start virtuoso as follows. Server listens on port 8999. (Alternatively follow the instructions from scratch instructions below) {{{ ssh metropolis sudo su haussmae cd /home/haussmae/keyword-translation make start-virtuoso-mini-cai }}} Start the HTTP proxy as follows (kill old process manually, listens on port 9000): |
Line 38: | Line 58: |
=== Virtuoso instance (from scratch) === Start virtuoso as follows. Server listens on port 8999. {{{ mkdir virtuoso-freebase cd virtuoso-freebase # get dependencies sudo apt install bison flex gperf libssl-dev # get the source git clone https://github.com/openlink/virtuoso-opensource.git cd virtuoso-opensource git checkout stable/7 ./autogen.sh ./configure --prefix=$(pwd)/virtuoso/install }}} check that everything ran without errors. Also make sure, that no virtuoso is already running on this machine. The following steps intermediately start an instance and fail otherwise {{{ make -j && make install cd .. # get the data wget "http://elba.informatik.uni-freiburg.de/freebase-qa/data/virtuoso.tar.gz" && tar xvfz virtuoso.tar.gz # run in tmux tmux ./virtuoso-opensource/virtuoso/install/bin/virtuoso-t -f +configfile virtuoso-db/virtuoso.ini # ctrl+b+d }}} To get better performance through more aggressive caching we add varnish as a caching reverse proxy in front of virtuoso. Install it from the repo to get security updates without hassle {{{ sudo apt install varnish }}} For some reason Ubuntu thinks it should run this newly installed service with a default config (it really isn't) so disable it again {{{ sudo systemctl disable varnish.service # so it doesn't start again sudo systemctl stop varnish.service # so the current instance is stopped }}} Now we need a config {{{ mkdir varnish cat << 'EOF' > varnish/varnish.vcl # # This puts varnish in front of virtuoso. # vcl 4.0; # Default backend definition. Set this to point to your content server. backend default { .host = "localhost"; .port = "8999"; .connect_timeout = 90s; .between_bytes_timeout = 30s; } sub vcl_recv { # Happens before we check if we have this in cache already. # # Typically you clean up the request here, removing cookies you don't need, # rewriting the request, etc. # Just put everything in cache. return(hash); } sub vcl_backend_response { # Happens after we have read the response headers from the backend. # # Here you clean the response headers, removing silly Set-Cookie headers # and other mistakes your backend does. # We set TTL to a looong time. set beresp.ttl = 99999h; if(beresp.status == 404) { # Cache 404 responses for 15 seconds set beresp.ttl = 15s; set beresp.grace = 15s; } } sub vcl_deliver { # Happens when we have all the pieces we need, and are about to send the # response to the client. # # You can do accounting or modifying the final object here. return(deliver); } EOF }}} And finally we start it (in tmux). Don't forget to replace '''vulcano''' with your hostname {{{ tmux a # ctrl+b+c varnishd -a vulcano:9000 -f varnish/varnish.vcl -F -n /tmp -s malloc,10G -p http_resp_size=10000000 -p http_req_size=1000000 -p http_resp_hdr_len=1000000 -p http_req_hdr_len=1000000 # ctrl+b+d }}} Now Varnish will proxy virtuoso on port 9000. === Parser === Aqqu uses a parser to get part-of-speech tags of query words. The parser is accessed via HTTP Api calls. To start the parser server: {{{ ssh metropolis sudo su haussmae cd /home/haussmae/keyword-translation make start-parser }}} The port is configured in the corenlp-frontent/build.xml. The API can be accessed like this: http://metropolis.informatik.uni-freiburg.de:4000/parse/?text=This%20is%20a%20test%20sentence. === Run the new Aqqu version (with NN) === Start on titan (requires GPU): {{{ ssh titan sudo su haussmae cd /home/haussmae/aqqu-bitbucket source activate aqqu PYTHONPATH=$(pwd):$PYTHONPATH python webserver/translation_webserver.py }}} Runs on port 5454 on titan now. However, titan is not available from outside the uni network. To start a port-forwarding from metropolis: {{{ ssh metropolis sudo su haussmae cd /home/haussmae/temp/nc/python-port-forwardt python2 port-forward.py }}} The service is now available on metropolis:5454 === How to update (any) Virtuoso with custom data === |
|
Line 43: | Line 209: |
grant execute on SPARQL_INSERT_DICT_CONTENT to "SPARQL”; | grant execute on SPARQL_INSERT_DICT_CONTENT to "SPARQL"; |
Line 45: | Line 211: |
grant execute on SPARQL_DELETE_DICT_CONTENT to "SPARQL”; | grant execute on SPARQL_DELETE_DICT_CONTENT to "SPARQL"; |
Line 72: | Line 238: |
== Data == All of the required data to run Aqqu is part of the materials (see above). It is located in the ''data'' subfolder. The scripts to create this data are part of the (old) keyword-translation repository: [[https://bitbucket.org/onekonek/keyword-translation]] |
Aqqu
Contents
Description
Question answering from Freebase as described in the CIKM 2015 publication. The code below also contains some improvements (neural network, performance) that came after the publication. The public code also contains a README that describes how to download, install, train and run the system. Below describes how to setup the demo (for which the code is not public).
Code
Public GitHub repository: https://github.com/elmar-haussmann/aqqu .
Internal git repository (contains work after publication, mainly neural net and performance improvements): https://bitbucket.org/elmar-haussmann/aqqu .
Internal git repository for the web-UI (we didn't put that public): https://bitbucket.org/elmar-haussmann/aqqu-webserver .
Demo
Aqqu instance
2016-06-30: runs under http://metropolis.informatik.uni-freiburg.de:5455
Start as follows on metropolis:
ssh metropolis sudo su haussmae cd /home/haussmae/demos/aqqu-demo source venv/bin/activate # was: activate aqqu PYTHONPATH=$(pwd):$PYTHONPATH python webserver/translation_webserver.py
Virtuoso instance (via Elmars home)
2016-06-30: Virtuoso instance for Aqqu runs unter http://metropolis.informatik.uni-freiburg.de:9000/sparql .
Start virtuoso as follows. Server listens on port 8999. (Alternatively follow the instructions from scratch instructions below)
ssh metropolis sudo su haussmae cd /home/haussmae/keyword-translation make start-virtuoso-mini-cai
Start the HTTP proxy as follows (kill old process manually, listens on port 9000):
ssh metropolis sudo su haussmae cd /home/haussmae/keyword-translation make start-varnish
Virtuoso instance (from scratch)
Start virtuoso as follows. Server listens on port 8999.
mkdir virtuoso-freebase cd virtuoso-freebase # get dependencies sudo apt install bison flex gperf libssl-dev # get the source git clone https://github.com/openlink/virtuoso-opensource.git cd virtuoso-opensource git checkout stable/7 ./autogen.sh ./configure --prefix=$(pwd)/virtuoso/install
check that everything ran without errors. Also make sure, that no virtuoso is already running on this machine. The following steps intermediately start an instance and fail otherwise
make -j && make install cd .. # get the data wget "http://elba.informatik.uni-freiburg.de/freebase-qa/data/virtuoso.tar.gz" && tar xvfz virtuoso.tar.gz # run in tmux tmux ./virtuoso-opensource/virtuoso/install/bin/virtuoso-t -f +configfile virtuoso-db/virtuoso.ini # ctrl+b+d
To get better performance through more aggressive caching we add varnish as a caching reverse proxy in front of virtuoso.
Install it from the repo to get security updates without hassle
sudo apt install varnish
For some reason Ubuntu thinks it should run this newly installed service with a default config (it really isn't) so disable it again
sudo systemctl disable varnish.service # so it doesn't start again sudo systemctl stop varnish.service # so the current instance is stopped
Now we need a config
mkdir varnish cat << 'EOF' > varnish/varnish.vcl # # This puts varnish in front of virtuoso. # vcl 4.0; # Default backend definition. Set this to point to your content server. backend default { .host = "localhost"; .port = "8999"; .connect_timeout = 90s; .between_bytes_timeout = 30s; } sub vcl_recv { # Happens before we check if we have this in cache already. # # Typically you clean up the request here, removing cookies you don't need, # rewriting the request, etc. # Just put everything in cache. return(hash); } sub vcl_backend_response { # Happens after we have read the response headers from the backend. # # Here you clean the response headers, removing silly Set-Cookie headers # and other mistakes your backend does. # We set TTL to a looong time. set beresp.ttl = 99999h; if(beresp.status == 404) { # Cache 404 responses for 15 seconds set beresp.ttl = 15s; set beresp.grace = 15s; } } sub vcl_deliver { # Happens when we have all the pieces we need, and are about to send the # response to the client. # # You can do accounting or modifying the final object here. return(deliver); } EOF
And finally we start it (in tmux). Don't forget to replace vulcano with your hostname
tmux a # ctrl+b+c varnishd -a vulcano:9000 -f varnish/varnish.vcl -F -n /tmp -s malloc,10G -p http_resp_size=10000000 -p http_req_size=1000000 -p http_resp_hdr_len=1000000 -p http_req_hdr_len=1000000 # ctrl+b+d
Now Varnish will proxy virtuoso on port 9000.
Parser
Aqqu uses a parser to get part-of-speech tags of query words. The parser is accessed via HTTP Api calls. To start the parser server:
ssh metropolis sudo su haussmae cd /home/haussmae/keyword-translation make start-parser
The port is configured in the corenlp-frontent/build.xml. The API can be accessed like this: http://metropolis.informatik.uni-freiburg.de:4000/parse/?text=This%20is%20a%20test%20sentence.
Run the new Aqqu version (with NN)
Start on titan (requires GPU):
ssh titan sudo su haussmae cd /home/haussmae/aqqu-bitbucket source activate aqqu PYTHONPATH=$(pwd):$PYTHONPATH python webserver/translation_webserver.py
Runs on port 5454 on titan now. However, titan is not available from outside the uni network.
To start a port-forwarding from metropolis:
ssh metropolis sudo su haussmae cd /home/haussmae/temp/nc/python-port-forwardt python2 port-forward.py
The service is now available on metropolis:5454
How to update (any) Virtuoso with custom data
Grant access rights via the ISQL tool as follows:
data/virtuoso/install/bin/isql localhost:1112 dba dba grant execute on SPARQL_INSERT_DICT_CONTENT to SPARQL_UPDATE; grant execute on SPARQL_INSERT_DICT_CONTENT to "SPARQL"; grant execute on SPARQL_DELETE_DICT_CONTENT to SPARQL_UPDATE; grant execute on SPARQL_DELETE_DICT_CONTENT to "SPARQL";
Complex example SPARQL query from "Programmieren in C++, SS 2016, Ü10 (all action or animation movie with their release date, genre, director, production company, and rating):
PREFIX fb: <http://rdf.freebase.com/ns/> SELECT DISTINCT ?fn, ?y, ?gn, ?dn, ?pn, ?rn where { ?f fb:type.object.type fb:film.film . ?f fb:film.film.initial_release_date ?y . ?f fb:film.film.genre ?g . ?f fb:film.film.directed_by ?d . ?f fb:film.film.production_companies ?p . ?f fb:film.film.rating ?r . ?f fb:type.object.name ?fn . ?g fb:type.object.name ?gn . ?d fb:type.object.name ?dn . ?p fb:type.object.name ?pn . ?r fb:type.object.name ?rn FILTER(lang(?fn)='en') FILTER(lang(?gn)='en') FILTER(lang(?dn)='en') FILTER(lang(?pn)='en') FILTER(lang(?rn)='en') FILTER(?gn='Action Film'@en OR ?gn='Animation'@en) }
Data
All of the required data to run Aqqu is part of the materials (see above). It is located in the data subfolder. The scripts to create this data are part of the (old) keyword-translation repository: https://bitbucket.org/onekonek/keyword-translation