⇤ ← Revision 1 as of 2018-05-11 17:13:29
Size: 716
Comment:
|
Size: 923
Comment:
|
Deletions are marked like this. | Additions are marked like this. |
Line 3: | Line 3: |
''Goal'': Decompose the sentences of a given web-size corpus (namely !ClueWeb12 [1]) into their semantic components. Requirements: | ''Goal'': Decompose the sentences of a given web-size corpus into their semantic components. Requirements: |
Line 5: | Line 5: |
1. The decomposition should be based on CSD-IE, developed in our group [2]. <<BR>> | 1. It should work on !ClueWeb12 [1] |
Line 7: | Line 7: |
2. Our current implmentation is based on a rather slow parser. This should be switched to the much faster spaCy parser [3]. <<BR>> | 2. The decomposition should be based on CSD-IE, developed in our group [2]. <<BR>> |
Line 9: | Line 9: |
3. Our current implmentation is based on a rather slow parser. This should be switched to the much faster spaCy parser [3]. <<BR>> | |
Line 10: | Line 11: |
4. The output format should be compatible with !QLever (easy), our own SPARQL+Text search engine [4]. | |
Line 20: | Line 21: |
[4] https://github.com/ad-freiburg/QLever#51-input-data (README, Section 5.1 Input Data) |
Context Decomposition of a Web Corpus
Goal: Decompose the sentences of a given web-size corpus into their semantic components. Requirements:
1. It should work on ClueWeb12 [1]
2. The decomposition should be based on CSD-IE, developed in our group [2].
3. Our current implmentation is based on a rather slow parser. This should be switched to the much faster spaCy parser [3].
4. The output format should be compatible with !QLever (easy), our own SPARQL+Text search engine [4].
[1] http://lemurproject.org/clueweb12/ We have purchased this dataset and it's available on our file system.
[2] CSD-IE Paper: http://filicudi.informatik.uni-freiburg.de:6543/publications , CSD-IE Demo: http://filicudi.informatik.uni-freiburg.de:6543 (university-internal ID)
[4] https://github.com/ad-freiburg/QLever#51-input-data (README, Section 5.1 Input Data)