AD Research Wiki
  • Comments
  • Immutable Page
  • Menu
    • Navigation
    • RecentChanges
    • FindPage
    • Local Site Map
    • Help
    • HelpContents
    • HelpOnMoinWikiSyntax
    • Display
    • Attachments
    • Info
    • Raw Text
    • Print View
    • Edit
    • Load
    • Save
  • Login

FrontPage

Revision 2 as of 2020-07-31 16:09:45
AD Research Wiki:
  • Projects
  • Ambiverse

Ambiverse

Set Up

Note: Starting the database backend takes almost 400GB of memory in a docker volume. Make sure you are running Ambiverse on a system that has a sufficiently large /var/lib/docker partition. Also make sure to clean up the docker volume if it is not needed anymore.

1) Download the code from GitHub

2) Start the database backend

docker run -d --name nlu-db-postgres -p 5432:5432 -e POSTGRES_DB=aida_20180120_cs_de_en_es_ru_zh_v18 -e POSTGRES_USER=ambiversenlu -e POSTGRES_PASSWORD=ambiversenlu ambiverse/nlu-db-postgres

3) Adapt the database configuration. For this, you need to adjust src/main/config/aida_20180120_cs_de_en_es_ru_zh_v18_db/database_aida.properties such that the property dataSource.serverName points to the host of the machine that runs the database.

Run

Run the pipeline with

export AIDA_CONF=aida_20180120_cs_de_en_es_ru_zh_v18_db
mkdir nlu-input
echo "Jack founded Alibaba with investments from SoftBank and Goldman." > nlu-input/doc.txt
./scripts/driver/run_pipeline.sh -d nlu-input -i TEXT -l en -pip ENTITY_SALIENCE

The output will be in nlu-input/disambiguationOutput/runs/<run_id>/doc.txt.json. See below for an example output.

Note: You can put several document files into the nlu-input directory and Ambiverse will disambiguate them all. However, Ambiverse outputs the results only once all documents are disambiguated. Therefore, when you disambiguate a lot of documents at once you might run out of RAM.

Example Output

The example will produce the following output

{
   "docId":"doc.txt",
   "language":"en",
   "matches":[
      {
         "charLength":4,
         "charOffset":0,
         "text":"Jack",
         "entity":{
            "id":"http://www.wikidata.org/entity/Q1137062",
            "confidence":0.8223449105622849
      },
         "type":"PER"
      },
      {
         "charLength":7,
         "charOffset":13,
         "text":"Alibaba",
         "entity":{
            "id":"http://www.wikidata.org/entity/Q1359568",
            "confidence":0.898317571182365
      },
         "type":"ORG"
      },
      {
         "charLength":8,
         "charOffset":43,
         "text":"SoftBank",
         "entity":{
            "id":"http://www.wikidata.org/entity/Q201653",
            "confidence":0.9477598497286538
      },
         "type":"ORG"
      },
      {
         "charLength":7,
         "charOffset":56,
         "text":"Goldman",
         "entity":{
            "id":"http://www.wikidata.org/entity/Q193326",
            "confidence":0.21759451076620498
      },
         "type":"PER"
      }
   ],
   "entities":[
      {
         "id":"http://www.wikidata.org/entity/Q1137062",
         "name":"Jack Ma",
         "url":"http://en.wikipedia.org/wiki/Jack%20Ma",
         "type":"PERSON",
         "salience":0.8495625716691926
      },
      {
         "id":"http://www.wikidata.org/entity/Q1359568",
         "name":"Alibaba Group",
         "url":"http://en.wikipedia.org/wiki/Alibaba%20Group",
         "type":"ORGANIZATION",
         "salience":0.48413245371823244
      },
      {
         "id":"http://www.wikidata.org/entity/Q201653",
         "name":"SoftBank Group",
         "url":"http://en.wikipedia.org/wiki/SoftBank%20Group",
         "type":"ORGANIZATION",
         "salience":0.20925363664207905
      },
      {
         "id":"http://www.wikidata.org/entity/Q193326",
         "name":"Goldman Sachs",
         "url":"http://en.wikipedia.org/wiki/Goldman%20Sachs",
         "type":"ORGANIZATION",
         "salience":0.19459704180588466
      }
   ]
}
  • MoinMoin Powered
  • Python Powered
  • GPL licensed
  • Valid HTML 4.01