'''Goal''': Decompose the sentences of a given web-size corpus into their semantic components. Requirements: 1. It should work on !ClueWeb12 [1] 2. The decomposition should be based on CSD-IE, developed in our group [2]. <
> 3. Our current implmentation is based on a rather slow parser. This should be switched to the much faster spaCy parser [3]. <
> 4. The output format should be compatible with !QLever (easy), our own SPARQL+Text search engine [4]. [1] http://lemurproject.org/clueweb12/ We have purchased this dataset and it's available on our file system. [2] CSD-IE Paper: http://filicudi.informatik.uni-freiburg.de:6543/publications , CSD-IE Demo: http://filicudi.informatik.uni-freiburg.de:6543 (university-internal ID) [3] https://spacy.io/ [4] https://github.com/ad-freiburg/QLever#51-input-data (README, Section 5.1 Input Data)