= Ambiverse = == Set Up == '''Note:''' Starting the database backend takes almost 400GB of memory in a docker volume. Make sure you are running Ambiverse on a system that has a sufficiently large `/var/lib/docker` partition. Also make sure to clean up the docker volume if it is not needed anymore. 1) Download the code from [[https://github.com/ambiverse-nlu/ambiverse-nlu|GitHub]] 2) Start the database backend (this can take several hours until the database is fully loaded) {{{ docker run -d --name nlu-db-postgres -p 5432:5432 -e POSTGRES_DB=aida_20180120_cs_de_en_es_ru_zh_v18 -e POSTGRES_USER=ambiversenlu -e POSTGRES_PASSWORD=ambiversenlu ambiverse/nlu-db-postgres }}} 3) Adapt the database configuration. For this, you need to adjust `src/main/config/aida_20180120_cs_de_en_es_ru_zh_v18_db/database_aida.properties` such that the property `dataSource.serverName` points to the host of the machine that runs the database. == Run == Run the pipeline with {{{ export AIDA_CONF=aida_20180120_cs_de_en_es_ru_zh_v18_db mkdir nlu-input echo "Jack founded Alibaba with investments from SoftBank and Goldman." > nlu-input/doc.txt ./scripts/driver/run_pipeline.sh -d nlu-input -i TEXT -l en -pip ENTITY_SALIENCE }}} The output will be in `nlu-input/disambiguationOutput/runs//doc.txt.json`. See below for an example output. '''Note:''' You can put several document files into the `nlu-input` directory and Ambiverse will disambiguate them all. However, Ambiverse outputs the results only once all documents are disambiguated. Therefore, when you disambiguate a lot of documents at once you might run out of RAM. == Example Output == The example will produce the following output {{{ { "docId":"doc.txt", "language":"en", "matches":[ { "charLength":4, "charOffset":0, "text":"Jack", "entity":{ "id":"http://www.wikidata.org/entity/Q1137062", "confidence":0.8223449105622849 }, "type":"PER" }, { "charLength":7, "charOffset":13, "text":"Alibaba", "entity":{ "id":"http://www.wikidata.org/entity/Q1359568", "confidence":0.898317571182365 }, "type":"ORG" }, { "charLength":8, "charOffset":43, "text":"SoftBank", "entity":{ "id":"http://www.wikidata.org/entity/Q201653", "confidence":0.9477598497286538 }, "type":"ORG" }, { "charLength":7, "charOffset":56, "text":"Goldman", "entity":{ "id":"http://www.wikidata.org/entity/Q193326", "confidence":0.21759451076620498 }, "type":"PER" } ], "entities":[ { "id":"http://www.wikidata.org/entity/Q1137062", "name":"Jack Ma", "url":"http://en.wikipedia.org/wiki/Jack%20Ma", "type":"PERSON", "salience":0.8495625716691926 }, { "id":"http://www.wikidata.org/entity/Q1359568", "name":"Alibaba Group", "url":"http://en.wikipedia.org/wiki/Alibaba%20Group", "type":"ORGANIZATION", "salience":0.48413245371823244 }, { "id":"http://www.wikidata.org/entity/Q201653", "name":"SoftBank Group", "url":"http://en.wikipedia.org/wiki/SoftBank%20Group", "type":"ORGANIZATION", "salience":0.20925363664207905 }, { "id":"http://www.wikidata.org/entity/Q193326", "name":"Goldman Sachs", "url":"http://en.wikipedia.org/wiki/Goldman%20Sachs", "type":"ORGANIZATION", "salience":0.19459704180588466 } ] } }}}