9026
Comment:
|
18055
|
Deletions are marked like this. | Additions are marked like this. |
Line 1: | Line 1: |
= Broccoli = | #acl Samuel Roth:read,write Björn Buchhold:read,write All:read Everything about the Broccoli project. To restart the latest instance, only the very first section is relevant. Note that (re-)building a (fresh) Wikipedia+FBEasy index is different from setting up an index for different data. This is because our UIMA chain contains steps that use data specific to that input and that would have to be emulated for other collections. For different collections, it's usually better to bypass the whole UIMA chain and produce a words- and docs-file by hand. Both approaches are described below. <<TableOfContents(3)>> == Current Broccoli version == Built by Björn beginning of August 2016, Wikipedia version from August 2016 (2.8B postings), latest Freebase dump (freebase-rdf-latest, 372M statements extracted a la Freebase Easy). === Start the service on elba === Accessible under http://broccoli.informatik.uni-freiburg.de {{{ ssh bast@elba tmux attach -t broccoli || tmux new -s broccoli Ctrl+B :setw -g mode-mouse on cd /local/data2/broccoli/wikipedia+freebase.26-09-2016 ./start.sh Ctrl+B+D }}} The script contains all files as variables. Changing something (e.g. other custom scores or triple scores should be self explanatory). For reference (btw: I have no idea how to write proper shellscrips, don't judge :-) ) NOTE: The snippet below is simply for references w.r.t. how to change some files. The most up-to-date version is the one on elba. {{{ #!/bin/bash dir='/local/data2/broccoli/16-09-26/' binary='ServerMain.2017-02-13' index='semantic-wikipedia-full-jul16' kbsuffix='-ontology' kb=$index$kbsuffix port='6002' stopwords='semantic-wikipedia.stop-words' triplescores='triple-scores.14oct' mapping='semantic-wikipedia-full-jul16-ontology.url-mapping' customscores='custom-scores.txt' date=`date +%Y-%m-%d` log="server-log.$date" call=$dir$binary args=(-p $port -o $dir$kb -m $dir$mapping -t $dir$triplescores -s $dir$stopwords -c $dir$customscores $dir$index) echo "Starting broccoli instance and writing output into $dir$log" echo "Starting now will be available within the minute." $call "${args[@]}" > $dir$log & }}} === Start Broccoli YAGO on elba === Similar to the "normal" Broccoli version, just start like: {{{ cd /local/data1/broccoli/broccoli-yago ./start.sh }}} Building a new version is complicated. Best go back to svn revision r1438. === Deploy a user interface === You need a tomcat installed on the server where the UI should be running. On the machine from where you want to compile and deploy the UI, you need several jar files and their paths set correctly. This can be done by hand but is easiest if ant and also tomcat are installed. Also /nfs/raid5 has to be mounted there. Filicudi always works well for me. Then go to {{{ <broccoli-code-dir>/userinterface }}} Edit two files: '''build.properties''' and '''war/server_properties.xml''' In build.properties, adjust the proeprties "warfile.name=" (Name of the UI instance) and "manager.url=" (adjust the server on which it should be deployed) In server_properties.xml just set the correct backend host and port. Then run {{{ ant war deploy }}} or if the UI instance previously existed {{{ ant war redeploy }}} To deploy a UI for Broccoli Yago, go back to svn revision 1438 and do the same. === Data === The intermediate files of the August 2016 version can be found in /nfs/raid5/buchholb/semantic-wikipedia-data |
Line 9: | Line 117: |
=== Preparation === Install required dependencies. On a Ubuntu system, this should be all that is required: {{{ sudo apt install libboost-dev libboost-regex-dev libstxxl-dev libsparsehash-dev }}} |
|
Line 17: | Line 133: |
== Current Broccoli version == Built by Björn beginning of August 2016 (TODO: copy to elba), Wikipedia version from August 2016 (2.8B postings), latest Freebase dump (freebase-rdf-latest, 372M statements extracted a la Freebase Easy). == UIMA Chain == For an explanation see [[Broccoli Uima]] == Building an Index == First, obtain a Wikipedia XML and a Freebase dump (usually stored in /nfs/raid5/broccoli/...) and make sure these correct files are referenced in broccoli/Makefile and broccoli/freebase/Makefile. |
=== Known functionality issues === * Compression for the text-index lists is not optimal. QLever has several improvements but most importantly, currently frequency encoding is done the wrong way round (word, freq) pairs sorted by freq but ascending not descending before assigning ids! * The helper class globals/Socket.h is not optimal. In particular, receive uses a buffer on the method's stack with a small size that can be too small for very large queries. Also there's no reason not to use a buffer on the heap per socket instance. Again, see QLever. * There are more things that have been improve in QLever that could be considered a bug in Broccoli, sadly I do not recall more of them right now, this list is to be extended. * The Server sometimes crashes (SEGFAULT). Usually happens after a list is read from disk - and thus when decompressing a list, but cannot be reproduced by relaunching the same query or serious of queries from before. Needs further investigation and can probably be fixed when the text compression is optimized. === Known installation issues === '''STXXL:''' Currently broccoli depends on STXXL version <= 1.3.1. You can get version 1.3.1 via {{{ git clone --branch 1.3.1 https://github.com/stxxl/stxxl.git }}} To make STXXL compile in UNIX systems edit `/utils/mlock.cpp` in the STXXL root and add {{{ #include <ctime> #include <cerrno> }}} and replace `sleep(864000);` in `/utils/mlock.cpp:34` by {{{ nanosleep((const struct timespec[]){{0, 864000L}}, NULL); }}} Now run {{{ make config_gnu }}} eventually adjust the generated `make.settings.local` file. Now you can compile STXXL with {{{ make library_g++ }}} Adjust the path to the `stxxl.mk` file in `broccoli/paths.mak` {{{ STXXL_CONFIG = <path to your local stxxl root>/stxxl.mk }}} == Docker setup == Axel has started working on a Docker setup. There are three parts: (1) use Broccoli to build an index from given 'words'' and ''docs'' and ''knowledge base'' files; (2) start a Broccoli backend using such an index and deploy a Broccoli UI via Tomcat; (3) run the extensive pipeline to produce the various files needed to build an index. Axel has managed to enable a Docker setup for (1) and (2). This required the idenfitication and change of many hard-coded paths in the files from the Broccoli SVN. === Files, building the images, running the containers === The current version of Axels files are available under ''vulcano:/local/raid/ad/broccoli-docker''. There you find the following files and sub-directories {{{ docker-compose.yml # For building and running everything easily build # The various Dockerfiles and the Broccoli code data # The input files for the index builder and the built index files }}} Use the following commands for building and running: {{{ docker-compose build # Builds broccoli:index-builder, broccoli:backend, broccoli:frontend docker-compose up # Creates the containers for all three images and starts them in concert docker-compose up index-builder # Same, for only the index builder. docker-compose up backend # Same, for only the backend. docker-compose up frontent # Same, for only the frontend. docker-compose up -d # Same, but running in the background (as a "demon") docker-compose down # The opposite of docker-compose up docker-compose [start|stop] # Start or stop the containers }}} The containers can also be run explicitly with ''docker run'', for example: {{{ docker run -v /local/raid/ad/broccoli-docker/data:/data -e DBTAIL=example broccoli:index-builder }}} == Build a Wikipedia+FBEasy Index from scratch == First, obtain a Wikipedia XML and a Freebase dump (usually stored in /nfs/raid5/broccoli/...) and make sure the correct files are referenced in broccoli/Makefile and broccoli/freebase/Makefile. |
Line 45: | Line 236: |
This is the first of two parts of our UIMA Chain. For an explanation see [[Broccoli Uima]]. In this first step, the Wikipedia XML is parsed (from XML to a UIMA model), text is tokenized and parse trees are constructed. Since running an off-the-shelf parser is computationally expensive, this part uses the asynchronous scale-out and is run on many of our machines. |
|
Line 47: | Line 244: |
on FILICUDI!! run (in the broccoli fodler) | on FILICUDI!! (if you want to start the broker elsewhere, the Makefile has to be adjusted), run in the broccoli/ folder: |
Line 53: | Line 251: |
Then, on the server you want to have the reader and writer, run | Then, on the server you want to have the reader and writer (the "main" part), run |
Line 71: | Line 269: |
The first target calls the second part of our UIMA chain. It performs entity recognition and our NLP (CSD) and then writes a words- and a docs-file. The second target builds all necessary binary indices (and files like vocabularies) from that words- and docs-file. |
|
Line 90: | Line 290: |
Produce the following files: ==== 1. A Knowledge Base file, name: $(DBTAIL)-ontology.txt ==== |
==== Produce the following files: ==== ''' 1. A Knowledge Base file, name: $(DBTAIL)-ontology.txt ''' |
Line 110: | Line 310: |
==== 2. A file with text postings, named: $(DBTAIL).words-by-contexts.txt ==== | ''' 2. A file with text postings, named: $(DBTAIL).words-by-contexts.txt ''' |
Line 115: | Line 315: |
this 1 1 0 is 1 1 1 just 1 1 2 an 1 1 3 example 1 1 4 I 2 5 0 |
this 1 1 0 is 1 1 1 just 1 1 2 an 1 1 3 example 1 1 4 I 2 5 0 |
Line 122: | Line 322: |
hope 2 1 1 it 2 1 2 helps 2 1 3 }}} ==== 3. A file with with info to display for hits, named: $(DBTAIL).docs-by-contexts.txt ==== Tab-separated, one line per context. contextId<TAB>URL<TAB>Title<TAB>Text to display (positions from wordsfile are separated by @@)<TAB>Which positions belong to the context (used for grey vs black highlighting in the UI)<TAB>parse-tree<TAB>all-context-boundaries-in-sentence. The last two columns can be filled with dummy/empty information (contexts 0-maxpos, empty parse tree). The number of tabs must remain the same, though. Maybe the UI needs to be adjusted with an empty parse tree. |
hope 2 1 1 it 2 1 2 helps 2 1 3 }}} ''' 3. A file with with info to display for hits, named: $(DBTAIL).docs-by-contexts.txt ''' Tab-separated, one line per context. contextId<TAB>URL<TAB>Title<TAB>Text to display (positions from wordsfile are separated by @@)<TAB>Which positions belong to the context (used for grey vs black highlighting in the UI)<TAB>context-range (in terms of position for grey/black highlighting in the UI)<TAB>parse-tree<TAB>all-context-boundaries-in-sentence. The last three columns can be filled with dummy/empty information (contexts 0-maxpos, empty parse tree). The number of tabs must remain the same, though. Maybe the UI needs to be adjusted with an empty parse tree. |
Line 146: | Line 346: |
=== Create the following empty files === | ''' 4. Create the following empty files ''' |
Line 155: | Line 355: |
=== Buildung an index === | === Build an index === |
Line 162: | Line 362: |
=== Starting a server instance === | === Start a server instance === |
Line 169: | Line 369: |
=== Deploying a user interface === | === Deploy a user interface (to a tomcat webserver) === |
Line 173: | Line 373: |
cd broccoli/userintance | cd broccoli/userinterface |
Line 190: | Line 390: |
If you used the files in example-data, a query for PhD-Student occurs-with help should yield a result. == Image Cache == |
If you used the files in example-data, a query for PhD-Student occurs-with helps should yield a result. == Images in the UI == === Image Cache === The image cache service of the current instance (on elba) runs on ''filicudi''. It is located under ''/var/www/freebase-imgsvc''. It is simply a checkout of http://ad-svn.informatik.uni-freiburg.de/broccoli/freebase-imgsvc/. The actually directory where the images are cached is located at ''/nfs/raid5/broccoli/freebase-thumb-cache''. NEW 22-03-2017: [[https://docs.google.com/document/d/1yGYNuEYb1jvWK-qrRp2io1kb6LE6urNwABuc3oplGFE|Robin Krahl]] has written a new version of the script ''wpthumbsvc.php'' that asks the [[https://en.wikipedia.org/w/api.php|Mediawiki API of the English Wikipedia]]. The script greps the file ''/nfs/raid5/broccoli/freebase-thumb-cache.mid-to-wikipedia.unique-mids'', which contains the last mid from the file ''freebase-thumb-cache.mid-to-wikipedia'' and the corresponding Wikipedia name (this is usally the canonical Wikipedia name of the entity). |
Line 201: | Line 411: |
== Image Service == !HiWi project Kai Haase: see [[https://docs.google.com/document/d/1xVXSGWG9kB92LAfsbdq7-0p2fHbJFYbv1fKPqqq_eGQ|Google Doc]]. TODO: Florians code has a mechanism for removing outdated images, which also removes images in the cache which now return a 404 not found (which effectively removes all images from the cache after the shutdown of the Freebase API). This should be corrected. Here is the guilty piece of code from https://ad-websvn.informatik.uni-freiburg.de/broccoli/freebase-imgsvc/fbthumbsvc.php: |
=== Image Service === Florians code has a mechanism for removing outdated images, which also removes images in the cache which now return a 404 not found (which effectively removes all images from the cache after the shutdown of the Freebase API). This should be corrected. Here is the guilty piece of code from https://ad-websvn.informatik.uni-freiburg.de/broccoli/freebase-imgsvc/fbthumbsvc.php: |
Everything about the Broccoli project. To restart the latest instance, only the very first section is relevant.
Note that (re-)building a (fresh) Wikipedia+FBEasy index is different from setting up an index for different data. This is because our UIMA chain contains steps that use data specific to that input and that would have to be emulated for other collections. For different collections, it's usually better to bypass the whole UIMA chain and produce a words- and docs-file by hand. Both approaches are described below.
Contents
Current Broccoli version
Built by Björn beginning of August 2016, Wikipedia version from August 2016 (2.8B postings), latest Freebase dump (freebase-rdf-latest, 372M statements extracted a la Freebase Easy).
Start the service on elba
Accessible under http://broccoli.informatik.uni-freiburg.de
ssh bast@elba tmux attach -t broccoli || tmux new -s broccoli Ctrl+B :setw -g mode-mouse on cd /local/data2/broccoli/wikipedia+freebase.26-09-2016 ./start.sh Ctrl+B+D
The script contains all files as variables. Changing something (e.g. other custom scores or triple scores should be self explanatory).
For reference (btw: I have no idea how to write proper shellscrips, don't judge ) NOTE: The snippet below is simply for references w.r.t. how to change some files. The most up-to-date version is the one on elba.
#!/bin/bash dir='/local/data2/broccoli/16-09-26/' binary='ServerMain.2017-02-13' index='semantic-wikipedia-full-jul16' kbsuffix='-ontology' kb=$index$kbsuffix port='6002' stopwords='semantic-wikipedia.stop-words' triplescores='triple-scores.14oct' mapping='semantic-wikipedia-full-jul16-ontology.url-mapping' customscores='custom-scores.txt' date=`date +%Y-%m-%d` log="server-log.$date" call=$dir$binary args=(-p $port -o $dir$kb -m $dir$mapping -t $dir$triplescores -s $dir$stopwords -c $dir$customscores $dir$index) echo "Starting broccoli instance and writing output into $dir$log" echo "Starting now will be available within the minute." $call "${args[@]}" > $dir$log &
Start Broccoli YAGO on elba
Similar to the "normal" Broccoli version, just start like:
cd /local/data1/broccoli/broccoli-yago ./start.sh
Building a new version is complicated. Best go back to svn revision r1438.
Deploy a user interface
You need a tomcat installed on the server where the UI should be running. On the machine from where you want to compile and deploy the UI, you need several jar files and their paths set correctly. This can be done by hand but is easiest if ant and also tomcat are installed. Also /nfs/raid5 has to be mounted there. Filicudi always works well for me.
Then go to
<broccoli-code-dir>/userinterface
Edit two files: build.properties and war/server_properties.xml
In build.properties, adjust the proeprties "warfile.name=" (Name of the UI instance) and "manager.url=" (adjust the server on which it should be deployed) In server_properties.xml just set the correct backend host and port.
Then run
ant war deploy
- or if the UI instance previously existed
ant war redeploy
To deploy a UI for Broccoli Yago, go back to svn revision 1438 and do the same.
Data
The intermediate files of the August 2016 version can be found in /nfs/raid5/buchholb/semantic-wikipedia-data
Code
Code is in https://ad-websvn.informatik.uni-freiburg.de/broccoli/. The Code for CSD is in the subfolder: https://ad-websvn.informatik.uni-freiburg.de/broccoli/nlp/
Preparation
Install required dependencies. On a Ubuntu system, this should be all that is required:
sudo apt install libboost-dev libboost-regex-dev libstxxl-dev libsparsehash-dev
Compilation
make all -j
Ignore possible lint problems.
Known functionality issues
- Compression for the text-index lists is not optimal. QLever has several improvements but most importantly, currently frequency encoding is done the wrong way round (word, freq) pairs sorted by freq but ascending not descending before assigning ids!
- The helper class globals/Socket.h is not optimal. In particular, receive uses a buffer on the method's stack with a small size that can be too small for very large queries. Also there's no reason not to use a buffer on the heap per socket instance. Again, see QLever.
- There are more things that have been improve in QLever that could be considered a bug in Broccoli, sadly I do not recall more of them right now, this list is to be extended.
- The Server sometimes crashes (SEGFAULT). Usually happens after a list is read from disk - and thus when decompressing a list, but cannot be reproduced by relaunching the same query or serious of queries from before. Needs further investigation and can probably be fixed when the text compression is optimized.
Known installation issues
STXXL: Currently broccoli depends on STXXL version <= 1.3.1. You can get version 1.3.1 via
git clone --branch 1.3.1 https://github.com/stxxl/stxxl.git
To make STXXL compile in UNIX systems edit /utils/mlock.cpp in the STXXL root and add
#include <ctime> #include <cerrno>
and replace sleep(864000); in /utils/mlock.cpp:34 by
nanosleep((const struct timespec[]){{0, 864000L}}, NULL);
Now run
make config_gnu
eventually adjust the generated make.settings.local file. Now you can compile STXXL with
make library_g++
Adjust the path to the stxxl.mk file in broccoli/paths.mak
STXXL_CONFIG = <path to your local stxxl root>/stxxl.mk
Docker setup
Axel has started working on a Docker setup. There are three parts: (1) use Broccoli to build an index from given 'words and docs and knowledge base files; (2) start a Broccoli backend using such an index and deploy a Broccoli UI via Tomcat; (3) run the extensive pipeline to produce the various files needed to build an index. Axel has managed to enable a Docker setup for (1) and (2). This required the idenfitication and change of many hard-coded paths in the files from the Broccoli SVN.
The current version of Axels files are available under Use the following commands for building and running: The containers can also be run explicitly with
First, obtain a Wikipedia XML and a Freebase dump (usually stored in /nfs/raid5/broccoli/...) and make sure the correct files are referenced in broccoli/Makefile and broccoli/freebase/Makefile. Give a proper name to your index using the variable DBTAIL in broccoli/Makefile
If you use an existing RDF3X DB (as usually the case), make sure it is referenced correctly in broccoli/freebase/Makefile and only run (inside the broccoli folder): Otherwise run:
This is the first of two parts of our UIMA Chain. For an explanation see Broccoli Uima. In this first step, the Wikipedia XML is parsed (from XML to a UIMA model), text is tokenized and parse trees are constructed. Since running an off-the-shelf parser is computationally expensive, this part uses the asynchronous scale-out and is run on many of our machines. Make sure all paths are set correctly in paths.mak on FILICUDI!! (if you want to start the broker elsewhere, the Makefile has to be adjusted), run in the broccoli/ folder: Then, on the server you want to have the reader and writer (the "main" part), run To get things going run the following on as many PC's (and servers) as possible
Make sure to copy/move or reference the cas0.zip you built in broccoli/Makefile then run The first target calls the second part of our UIMA chain. It performs entity recognition and our NLP (CSD) and then writes a words- and a docs-file. The second target builds all necessary binary indices (and files like vocabularies) from that words- and docs-file.
Understand the process above and the UIMA framework (see official documentation) and modify accordingly.
There is a special folder broccoli/example-data now that has files that can be used as a blue-print for your own data.
1. A Knowledge Base file, name: $(DBTAIL)-ontology.txt Tab-separated, one line per triple: subject<TAB>predicate<TAB>object<TAB>. For example broccoli/example-data/example-ontology.txt: Values use XML Schema notation, e.g.: IMPORTANT: The name has to end with -ontology.txt and there has to be at least one triple with a relation is-a and one triple with some other relation (for historical, technical reasons). 2. A file with text postings, named: $(DBTAIL).words-by-contexts.txt Tab-separated, one line per posting. word<TAB>contextId<TAB>score<TAB>position, entities use underscores for spaces and are prefixed with :e:, e.g.: 3. A file with with info to display for hits, named: $(DBTAIL).docs-by-contexts.txt Tab-separated, one line per context. contextId<TAB>URL<TAB>Title<TAB>Text to display (positions from wordsfile are separated by @@)<TAB>Which positions belong to the context (used for grey vs black highlighting in the UI)<TAB>context-range (in terms of position for grey/black highlighting in the UI)<TAB>parse-tree<TAB>all-context-boundaries-in-sentence. The last three columns can be filled with dummy/empty information (contexts 0-maxpos, empty parse tree). The number of tabs must remain the same, though. Maybe the UI needs to be adjusted with an empty parse tree. See the example.docs-by-contexts.txt for somethind that is rather easy to understand (context 1 has positions 0-5, context 2 has positions 0-3): For real-world data with parse-tree information, this can get quite complex (see first sentence in the normal Broccoli data): 4. Create the following empty files Fill them with actual data if you have it available. It works with empty files (see example-data)
Call make build-index and set the variables DATA_DIRECTORY and DBTAIL:
Call make start and set the variables PORT, DATA_DIRECTORY and DBTAIL:
Go to edit the files with your data (adjust: instance name, Tomcat location, maybe username+password, server, port) compile and deploy
If you used the files in example-data, a query for PhD-Student occurs-with helps should yield a result.
The image cache service of the current instance (on elba) runs on The actually directory where the images are cached is located at NEW 22-03-2017: Robin Krahl has written a new version of the script To add individual images (for demos, needs access to raid so it can write to cache folder, and img has to have a file extension for convert to work, tested on filicudi, does not work on stromboli because code requires Python version >= 3.3):
Florians code has a mechanism for removing outdated images, which also removes images in the cache which now return a 404 not found (which effectively removes all images from the cache after the shutdown of the Freebase API). This should be corrected. Here is the guilty piece of code from https://ad-websvn.informatik.uni-freiburg.de/broccoli/freebase-imgsvc/fbthumbsvc.php:
An index that contains mediators (used for the CIKM presentation) is available in /nfs/raid5/haussmae/demos/broccoli_mediators_no_text to start (on filicudi, port 7099, should work as any user that can read the files): The user interface for backend filicudi:7099 is available at http://filicudi.informatik.uni-freiburg.de:6222/BroccoliCIKM (no UI hack) and http://filicudi.informatik.uni-freiburg.de:6222/BroccoliCIKM2 (UI hack). The UI hack makes specific mediator names readable in the query graph (and only there). The hack adjusts the nameLabel variable in the File src/de/uni/freiburg/broccoli/client/ui/BreadcrumbLabel.java of userinterface (in the broccoli respository). Files, building the images, running the containers
docker-compose.yml # For building and running everything easily
build # The various Dockerfiles and the Broccoli code
data # The input files for the index builder and the built index files
docker-compose build # Builds broccoli:index-builder, broccoli:backend, broccoli:frontend
docker-compose up # Creates the containers for all three images and starts them in concert
docker-compose up index-builder # Same, for only the index builder.
docker-compose up backend # Same, for only the backend.
docker-compose up frontent # Same, for only the frontend.
docker-compose up -d # Same, but running in the background (as a "demon")
docker-compose down # The opposite of docker-compose up
docker-compose [start|stop] # Start or stop the containers
docker run -v /local/raid/ad/broccoli-docker/data:/data -e DBTAIL=example broccoli:index-builder
Build a Wikipedia+FBEasy Index from scratch
Create ontology.txt
make get-freebase-ontology
make -C freebase/ build-db
make get-freebase-ontology
Create cas0.zip
make deploy-broker
make deploy-reader
make deploy-senna
Create a broccoli index
make build-txt build-index
Start the server
make start PORT=<PORT>
Set up a Broccoli instance for different data
With parts of our chain
From scratch
Produce the following files:
Björn Buchhold is-a PhD Student .
Björn Buchhold is-a Person .
Björn Buchhold Country of nationality Germany .
Some Song Length "19.0"^^<http://www.w3.org/2001/XMLSchema#float> .
this 1 1 0
is 1 1 1
just 1 1 2
an 1 1 3
example 1 1 4
I 2 5 0
:e:Björn_Buchhold 2 5 0
hope 2 1 1
it 2 1 2
helps 2 1 3
1 Example_Document http://example.com This@@ is@@ just@@ an@@ example@@. 0-5 NoParseTree 0-5
1 Example_Document http://example.com I@@ hope@@ it@@ helps@@. 0-3 NoParseTree 0-3
1 http://en.wikipedia.org/wiki/Alain_Connes Alain Connes Alain Connes@@ (@@;@@ born@@ 1@@ April@@ 1947@@)@@ is@@ a@@ French@@ mathematician@@,@@ currently@@ Professor@@ at@@ the@@ Collège de France@@,@@ IHÉS@@,@@ The Ohio State University@@ and@@ Vanderbilt University@@. 0-1,7-11,13-17 0_Alain_NNP_*_(S1,(S,(NP_(ENUM,(C,(CH 0_Connes_NNP_*_)_) 1_(_*_*_*_* 2_;_:_*_(S1,(NP_(C* 3_born_VBN_*_(NP_* 4_1_CD_*_*_* 5_April_NNP_*_*_* 6_1947_NN_*_),),)_) 7_)_*_*_*_* 8_is_VBZ_*_(VP_* 9_a_DT_*_(NP,(NP_* 10_French_JJ_*_*_* 11_mathematician_NN_*_)_* 12_,_,_*_*_* 13_currently_RB_*_(ADVP,)_* 14_Professor_NNP_*_(NP,(NP,)_* 15_at_IN_*_(PP_* 16_the_DT_*_(NP,(NP_(ENUM,(C 17_Collège_NNP_*_*_* 17_de_IN_*_*_* 17_France_NNP_*_)_) 18_,_,_*_*_* 19_IHÉS_NNP_*_(NP,)_(C,) 20_,_,_*_*_* 21_The_DT_*_(NP_(C 21_Ohio_NNP_*_*_* 21_State_NNP_*_*_* 21_University_NNP_*_)_) 22_and_CC_*_*_* 23_Vanderbilt_NNP_*_(NP_(C 23_University_NNP_*_),),),),),)_),) 24_._._*_),)_),) 0-1,7-11,13-17;0-1,7-11,13-15,19-19;0-1,7-11,13-15,21-21;0-1,7-11,13-15,23-23;0-0,3-6
touch $(DBTAIL)-ontology.entity-scores.noabs
touch $(DBTAIL)-ontology.name-mapping
touch $(DBTAIL)-ontology.reverse-relations
Build an index
make build-index DATA_DIRECTORY=/home/buchholb/broccoli/example-data DBTAIL=example
Start a server instance
make start PORT=6001 DATA_DIRECTORY=/home/buchholb/broccoli/example-data DBTAIL=example
Deploy a user interface (to a tomcat webserver)
cd broccoli/userinterface
vim build.properties
vim war/server_properties.xml
ant build war deploy
Testing if everything works
Images in the UI
Image Cache
python3 ~/broccoli/img-hack/image_to_cache.py --mid <MID> --img 'http://...'
Image Service
// If no image could be found (404 error) create a 404 cache file for the
// current id, return a 404 error and end the script.
if ($return_status_code == 404)
{
// If there still was an expired cache file then remove it now!
if ($cachefile_exists)
{
unlink($cachefile_path);
}
// Create a 404 cache file for the current id.
touch($cachefile_path . '_404');
returnMissingError();
}
Mediator Only Index (CIKM)
/home/haussmae/demos/broccoli_mediators_no_text/ServerMain -p 7099 -o /home/haussmae/demos/broccoli_mediators_no_text/semantic-wikipedia-scientists-ontology -s /home/haussmae/demos/broccoli_mediators_no_text/semantic-wikipedia.stop-words /home/haussmae/demos/broccoli_mediators_no_text/semantic-wikipedia-scientists -m /home/haussmae/demos/broccoli_mediators_no_text/semantic-wikipedia-scientists-ontology.url-mapping