Ambiverse
Set Up
Note: Starting the database backend takes almost 400GB of memory in a docker volume. Make sure you are running Ambiverse on a system that has a sufficiently large /var/lib/docker partition. Also make sure to clean up the docker volume if it is not needed anymore.
1) Download the code from GitHub
2) Start the database backend
docker run -d --name nlu-db-postgres -p 5432:5432 -e POSTGRES_DB=aida_20180120_cs_de_en_es_ru_zh_v18 -e POSTGRES_USER=ambiversenlu -e POSTGRES_PASSWORD=ambiversenlu ambiverse/nlu-db-postgres
3) Adapt the database configuration. For this, you need to adjust src/main/config/aida_20180120_cs_de_en_es_ru_zh_v18_db/database_aida.properties such that the property dataSource.serverName points to the host of the machine that runs the database.
Run
Run the pipeline with
export AIDA_CONF=aida_20180120_cs_de_en_es_ru_zh_v18_db mkdir nlu-input echo "Jack founded Alibaba with investments from SoftBank and Goldman." > nlu-input/doc.txt ./scripts/driver/run_pipeline.sh -d nlu-input -i TEXT -l en -pip ENTITY_SALIENCE
The output will be in nlu-input/disambiguationOutput/runs/<run_id>/doc.txt.json. See below for an example output.
Note: You can put several document files into the nlu-input directory and Ambiverse will disambiguate them all. However, Ambiverse outputs the results only once all documents are disambiguated. Therefore, when you disambiguate a lot of documents at once you might run out of RAM.
Example Output
The example will produce the following output
{ "docId":"doc.txt", "language":"en", "matches":[ { "charLength":4, "charOffset":0, "text":"Jack", "entity":{ "id":"http://www.wikidata.org/entity/Q1137062", "confidence":0.8223449105622849 }, "type":"PER" }, { "charLength":7, "charOffset":13, "text":"Alibaba", "entity":{ "id":"http://www.wikidata.org/entity/Q1359568", "confidence":0.898317571182365 }, "type":"ORG" }, { "charLength":8, "charOffset":43, "text":"SoftBank", "entity":{ "id":"http://www.wikidata.org/entity/Q201653", "confidence":0.9477598497286538 }, "type":"ORG" }, { "charLength":7, "charOffset":56, "text":"Goldman", "entity":{ "id":"http://www.wikidata.org/entity/Q193326", "confidence":0.21759451076620498 }, "type":"PER" } ], "entities":[ { "id":"http://www.wikidata.org/entity/Q1137062", "name":"Jack Ma", "url":"http://en.wikipedia.org/wiki/Jack%20Ma", "type":"PERSON", "salience":0.8495625716691926 }, { "id":"http://www.wikidata.org/entity/Q1359568", "name":"Alibaba Group", "url":"http://en.wikipedia.org/wiki/Alibaba%20Group", "type":"ORGANIZATION", "salience":0.48413245371823244 }, { "id":"http://www.wikidata.org/entity/Q201653", "name":"SoftBank Group", "url":"http://en.wikipedia.org/wiki/SoftBank%20Group", "type":"ORGANIZATION", "salience":0.20925363664207905 }, { "id":"http://www.wikidata.org/entity/Q193326", "name":"Goldman Sachs", "url":"http://en.wikipedia.org/wiki/Goldman%20Sachs", "type":"ORGANIZATION", "salience":0.19459704180588466 } ] }