AD Teaching Wiki
  • Comments
  • Immutable Page
  • Menu
    • Navigation
    • RecentChanges
    • FindPage
    • Local Site Map
    • Help
    • HelpContents
    • HelpOnMoinWikiSyntax
    • Display
    • Attachments
    • Info
    • Raw Text
    • Print View
    • Edit
    • Load
    • Save
  • Login

FrontPage

Revision 1 as of 2018-05-11 17:13:29
AD Teaching Wiki:
  • BachelorAndMasterProjectsAndTheses
  • ClueWebContextDecomposition

Context Decomposition of a Web Corpus

Goal: Decompose the sentences of a given web-size corpus (namely ClueWeb12 [1]) into their semantic components. Requirements:

1. The decomposition should be based on CSD-IE, developed in our group [2].

2. Our current implmentation is based on a rather slow parser. This should be switched to the much faster spaCy parser [3].

[1] http://lemurproject.org/clueweb12/ We have purchased this dataset and it's available on our file system.

[2] CSD-IE Paper: http://filicudi.informatik.uni-freiburg.de:6543/publications , CSD-IE Demo: http://filicudi.informatik.uni-freiburg.de:6543 (university-internal ID)

[3] https://spacy.io/

  • MoinMoin Powered
  • Python Powered
  • GPL licensed
  • Valid HTML 4.01