'''Goal:''' Design and implement a simple but effective named-entity recognizer for a web-size corpus, namely !ClueWeb12 [1]. The following features should be supported:
1. Recognize literal mentions (trivial). For example, recognize "Angela Merkel" as https://en.wikipedia.org/wiki/Angela_Merkel . <
>
2. Recognize partial mentions of entities, which have been mentioned literally before. For example, recognize "Merkel" as https://en.wikipedia.org/wiki/Angela_Merkel, after she has been mentioned with her full name before. <
>
3. Recognize mentions of entities via pronouns (he, she, it, ...), which have been mentioned literally before. For example, recognize "she" as https://en.wikipedia.org/wiki/Angela_Merkel if she has been mentioned before. Take the gender into account. That is, "she" should be identified as the last mention of a female entity. <
>
4. Recognize mentions of the form "the ", after a mention with the full entity name. For example, "the film" should be recognized as https://en.wikipedia.org/wiki/The_Matrix, if that film has been mentioned with its full name before. <
>
[1] http://lemurproject.org/clueweb12/ We have purchased this dataset and it's available on our file system.