Ambiverse
Set Up
Note: Starting the database backend takes almost 400GB of memory in a docker volume. Make sure you are running Ambiverse on a system that has a sufficiently large /var/lib/docker partition. Also make sure to clean up the docker volume if it is not needed anymore.
1) Download the code from GitHub
2) Start the database backend (this can take several hours until the database is fully loaded)
docker run -d --name nlu-db-postgres -p 5432:5432 -e POSTGRES_DB=aida_20180120_cs_de_en_es_ru_zh_v18 -e POSTGRES_USER=ambiversenlu -e POSTGRES_PASSWORD=ambiversenlu ambiverse/nlu-db-postgres
3) Adapt the database configuration. For this, you need to adjust src/main/config/aida_20180120_cs_de_en_es_ru_zh_v18_db/database_aida.properties such that the property dataSource.serverName points to the host of the machine that runs the database.
Run
Run the pipeline with
export AIDA_CONF=aida_20180120_cs_de_en_es_ru_zh_v18_db mkdir nlu-input echo "Jack founded Alibaba with investments from SoftBank and Goldman." > nlu-input/doc.txt ./scripts/driver/run_pipeline.sh -d nlu-input -i TEXT -l en -pip ENTITY_SALIENCE
The output will be in nlu-input/disambiguationOutput/runs/<run_id>/doc.txt.json. See below for an example output.
Note: You can put several document files into the nlu-input directory and Ambiverse will disambiguate them all. However, Ambiverse outputs the results only once all documents are disambiguated. Therefore, when you disambiguate a lot of documents at once you might run out of RAM.
Example Output
The example will produce the following output
{ "docId":"doc.txt", "language":"en", "matches":[ { "charLength":4, "charOffset":0, "text":"Jack", "entity":{ "id":"http://www.wikidata.org/entity/Q1137062", "confidence":0.8223449105622849 }, "type":"PER" }, { "charLength":7, "charOffset":13, "text":"Alibaba", "entity":{ "id":"http://www.wikidata.org/entity/Q1359568", "confidence":0.898317571182365 }, "type":"ORG" }, { "charLength":8, "charOffset":43, "text":"SoftBank", "entity":{ "id":"http://www.wikidata.org/entity/Q201653", "confidence":0.9477598497286538 }, "type":"ORG" }, { "charLength":7, "charOffset":56, "text":"Goldman", "entity":{ "id":"http://www.wikidata.org/entity/Q193326", "confidence":0.21759451076620498 }, "type":"PER" } ], "entities":[ { "id":"http://www.wikidata.org/entity/Q1137062", "name":"Jack Ma", "url":"http://en.wikipedia.org/wiki/Jack%20Ma", "type":"PERSON", "salience":0.8495625716691926 }, { "id":"http://www.wikidata.org/entity/Q1359568", "name":"Alibaba Group", "url":"http://en.wikipedia.org/wiki/Alibaba%20Group", "type":"ORGANIZATION", "salience":0.48413245371823244 }, { "id":"http://www.wikidata.org/entity/Q201653", "name":"SoftBank Group", "url":"http://en.wikipedia.org/wiki/SoftBank%20Group", "type":"ORGANIZATION", "salience":0.20925363664207905 }, { "id":"http://www.wikidata.org/entity/Q193326", "name":"Goldman Sachs", "url":"http://en.wikipedia.org/wiki/Goldman%20Sachs", "type":"ORGANIZATION", "salience":0.19459704180588466 } ] }