AD Research Wiki:

Ambiverse

Set Up

Note: Starting the database backend takes almost 400GB of memory in a docker volume. Make sure you are running Ambiverse on a system that has a sufficiently large /var/lib/docker partition. Also make sure to clean up the docker volume if it is not needed anymore.

1) Download the code from GitHub

2) Start the database backend (this can take several hours until the database is fully loaded)

docker run -d --name nlu-db-postgres -p 5432:5432 -e POSTGRES_DB=aida_20180120_cs_de_en_es_ru_zh_v18 -e POSTGRES_USER=ambiversenlu -e POSTGRES_PASSWORD=ambiversenlu ambiverse/nlu-db-postgres

3) Adapt the database configuration. For this, you need to adjust src/main/config/aida_20180120_cs_de_en_es_ru_zh_v18_db/database_aida.properties such that the property dataSource.serverName points to the host of the machine that runs the database.

Run

Run the pipeline with

export AIDA_CONF=aida_20180120_cs_de_en_es_ru_zh_v18_db
mkdir nlu-input
echo "Jack founded Alibaba with investments from SoftBank and Goldman." > nlu-input/doc.txt
./scripts/driver/run_pipeline.sh -d nlu-input -i TEXT -l en -pip ENTITY_SALIENCE

The output will be in nlu-input/disambiguationOutput/runs/<run_id>/doc.txt.json. See below for an example output.

Note: You can put several document files into the nlu-input directory and Ambiverse will disambiguate them all. However, Ambiverse outputs the results only once all documents are disambiguated. Therefore, when you disambiguate a lot of documents at once you might run out of RAM.

Example Output

The example will produce the following output

{
   "docId":"doc.txt",
   "language":"en",
   "matches":[
      {
         "charLength":4,
         "charOffset":0,
         "text":"Jack",
         "entity":{
            "id":"http://www.wikidata.org/entity/Q1137062",
            "confidence":0.8223449105622849
      },
         "type":"PER"
      },
      {
         "charLength":7,
         "charOffset":13,
         "text":"Alibaba",
         "entity":{
            "id":"http://www.wikidata.org/entity/Q1359568",
            "confidence":0.898317571182365
      },
         "type":"ORG"
      },
      {
         "charLength":8,
         "charOffset":43,
         "text":"SoftBank",
         "entity":{
            "id":"http://www.wikidata.org/entity/Q201653",
            "confidence":0.9477598497286538
      },
         "type":"ORG"
      },
      {
         "charLength":7,
         "charOffset":56,
         "text":"Goldman",
         "entity":{
            "id":"http://www.wikidata.org/entity/Q193326",
            "confidence":0.21759451076620498
      },
         "type":"PER"
      }
   ],
   "entities":[
      {
         "id":"http://www.wikidata.org/entity/Q1137062",
         "name":"Jack Ma",
         "url":"http://en.wikipedia.org/wiki/Jack%20Ma",
         "type":"PERSON",
         "salience":0.8495625716691926
      },
      {
         "id":"http://www.wikidata.org/entity/Q1359568",
         "name":"Alibaba Group",
         "url":"http://en.wikipedia.org/wiki/Alibaba%20Group",
         "type":"ORGANIZATION",
         "salience":0.48413245371823244
      },
      {
         "id":"http://www.wikidata.org/entity/Q201653",
         "name":"SoftBank Group",
         "url":"http://en.wikipedia.org/wiki/SoftBank%20Group",
         "type":"ORGANIZATION",
         "salience":0.20925363664207905
      },
      {
         "id":"http://www.wikidata.org/entity/Q193326",
         "name":"Goldman Sachs",
         "url":"http://en.wikipedia.org/wiki/Goldman%20Sachs",
         "type":"ORGANIZATION",
         "salience":0.19459704180588466
      }
   ]
}

AD Research Wiki: Projects/Ambiverse (last edited 2021-03-24 09:37:24 by Natalie Prange)