AD Teaching Wiki
  • Comments
  • Immutable Page
  • Menu
    • Navigation
    • RecentChanges
    • FindPage
    • Local Site Map
    • Help
    • HelpContents
    • HelpOnMoinWikiSyntax
    • Display
    • Attachments
    • Info
    • Raw Text
    • Print View
    • Edit
    • Load
    • Save
  • Login

FrontPage

Upload page content

You can upload content for the page named below. If you change the page name, you can also upload content for another page. If the page name is empty, we derive the page name from the file name.

File to load page content from
Page name
Comment

Revision 4 as of 2020-08-24 09:50:55
AD Teaching Wiki:
  • BachelorAndMasterProjectsAndTheses
  • ClueWebEntityRecognition

Goal: Design and implement a simple but effective named-entity recognizer for a web-size corpus, namely ClueWeb12 [1]. The following features should be supported:

1. Recognize literal mentions (trivial). For example, recognize "Angela Merkel" as https://en.wikipedia.org/wiki/Angela_Merkel .
2. Recognize partial mentions of entities, which have been mentioned literally before. For example, recognize "Merkel" as https://en.wikipedia.org/wiki/Angela_Merkel, after she has been mentioned with her full name before.
3. Recognize mentions of entities via pronouns (he, she, it, ...), which have been mentioned literally before. For example, recognize "she" as https://en.wikipedia.org/wiki/Angela_Merkel if she has been mentioned before. Take the gender into account. That is, "she" should be identified as the last mention of a female entity.
4. Recognize mentions of the form "the <TYPE>", after a mention with the full entity name. For example, "the film" should be recognized as https://en.wikipedia.org/wiki/The_Matrix, if that film has been mentioned with its full name before.

[1] http://lemurproject.org/clueweb12/ We have purchased this dataset and it's available on our file system.

  • MoinMoin Powered
  • Python Powered
  • GPL licensed
  • Valid HTML 4.01