Differences between revisions 1 and 2

Context Decomposition of a Web Corpus

Goal: Decompose the sentences of a given web-size corpus into their semantic components. Requirements:

1. It should work on ClueWeb12 [1]

2. The decomposition should be based on CSD-IE, developed in our group [2].

3. Our current implmentation is based on a rather slow parser. This should be switched to the much faster spaCy parser [3].

4. The output format should be compatible with !QLever (easy), our own SPARQL+Text search engine [4].

[1] http://lemurproject.org/clueweb12/ We have purchased this dataset and it's available on our file system.

-  ⇤ ← Revision 1 as of 2018-05-11 17:13:29 → 
  Size: 716
  Editor: Hannah Bast
  Comment:
+   ← Revision 2 as of 2018-05-11 17:16:47 → ⇥
  Size: 923
  Editor: Hannah Bast
  Comment:
-Deletions are marked like this.
+Additions are marked like this.
 Line 3:
-''Goal'': Decompose the sentences of a given web-size corpus (namely !ClueWeb12 [1]) into their semantic components. Requirements:
+''Goal'': Decompose the sentences of a given web-size corpus into their semantic components. Requirements:
 Line 5:
-. The decomposition should be based on CSD-IE, developed in our group [2]. <<BR>>
+. It should work on !ClueWeb12 [1]
 Line 7:
-. Our current implmentation is based on a rather slow parser. This should be switched to the much faster spaCy parser [3]. <<BR>>
+. The decomposition should be based on CSD-IE, developed in our group [2]. <<BR>>
 Line 9:
+. Our current implmentation is based on a rather slow parser. This should be switched to the much faster spaCy parser [3]. <<BR>>
-Line 10:
+Line 11:
+. The output format should be compatible with !QLever (easy), our own SPARQL+Text search engine [4].
-Line 20:
+Line 21:
+[4] https://github.com/ad-freiburg/QLever#51-input-data (README, Section 5.1 Input Data)