= Context Decomposition of a Web Corpus =
''Goal'': Decompose the sentences of a given web-size corpus (namely !ClueWeb12 [1]) into their semantic components. Requirements:
1. The decomposition should be based on CSD-IE, developed in our group [2]. <
>
2. Our current implmentation is based on a rather slow parser. This should be switched to the much faster spaCy parser [3]. <
>
[1] http://lemurproject.org/clueweb12/ We have purchased this dataset and it's available on our file system.
[2] CSD-IE Paper: http://filicudi.informatik.uni-freiburg.de:6543/publications , CSD-IE Demo: http://filicudi.informatik.uni-freiburg.de:6543 (university-internal ID)
[3] https://spacy.io/