AD Research Wiki:

Log of joint meeting of Johannes and Hannah to install Virtuoso and build an index for Freebase Easy

With docker

Pull the latest docker image, run Virtuoso in the background, and isql in the foreground as follows (on galera).

cd /local/data/virtuoso
docker pull openlink/virtuoso-opensource-7
docker run -dt -e DBA_PASSWORD=dba -p 1111:1111 -p 8890:8890 -v $(pwd):/database --name virtuoso openlink/virtuoso-opensource-7
vim virtuoso.ini [comment in high-memory settings for NumberOfBuffers and MaxDirtyBuffers]
docker exec -i virtuoso isql 1111
SQL> ld_dir('.', 'fbeasy.clean.ttl', 'https://fbeasy.cs.uni-freiburg.de');
SQL> rdf_loader_run();

The instructions for bulk loading follow http://vos.openlinksw.com/owiki/wiki/VOS/VirtBulkRDFLoader . The docker image is described under https://hub.docker.com/r/openlink/virtuoso-opensource-7 .

Bulk load started on 19-01-2021 at 01:00 CET. At 03:30 CET it was at 150M triples -> estimated total loading time 5 hours.

The Virtuoso SPARQL editor is then available under http://galera:8890/sparql . A good check whether everything works is always the following query, which returns the number of triples. Note that this also works during bulk upload and can be used to track the progress of the upload.

SELECT COUNT(*) WHERE { ?s ?p ?o }

Without docker (first trial, which eventually failed)

Installation

We followed the instructions on http://vos.openlinksw.com/owiki/wiki/VOS/VOSUbuntuNotes . The following command worked on galera (Ubuntu 18.04). I entered a password , as strongly recommended in the instructions (they said, it wouldn't work otherwise). But the message came that something went wrong with storing the password, but the subsequent steps worked anyway.

sudo apt install virtuoso-opensource

After the installation, the web page is immediately live under http://galera:8890

Index Build

On the console, just type the following (after copying the TTL file to the respective location):

isql-vt
SQL> DB.DBA.TTLP_MT (file_to_string_output ('/local/data/virtuoso/fbeasy.clean.ttl'), '', 'http://freebase-easy.cs.uni-freiburg.de');

The index build ran fine for while, with a rate between 0.5M - 1M triples / minutes, but then it stalled and galera:8890 became unresponsive and I could also not stop the server from the command line Maybe the reason was simply that I did not set NumberOfBuffers and MaxDirtyBuffers to higher values in virtuoso.ini, but I then found the docker image on docker hub and switched to that, see above.

NOTE 1: The TTL file was produced from fbeasy.ttl as follows. Control characters give the error message Error 37000: [Virtuoso Driver][Virtuoso Server]SP029: TURTLE RDF loader, line 5684442: Invalid characters in angle-bracketed name. URIs that are longer than 1900 bytes give the error message Error 23000: [Virtuoso Driver][Virtuoso Server]SR133: Can not set NULL to not nullable column.

grep -Pv "[\x00-\x08\x0a-\x1f]" fbeasy.ttl | awk 'length($0) <= 1000' > fbeasy.clean.ttl

NOTE 2: We first tried isqlw-vt (the Unicode-enabled variant), but that always returned an obscure error about a hostname that could not be resolved.

AD Research Wiki: Projects/Virtuoso (last edited 2021-01-19 03:35:56 by Hannah Bast)