AD Teaching Wiki
  • Comments
  • Immutable Page
  • Menu
    • Navigation
    • RecentChanges
    • FindPage
    • Local Site Map
    • Help
    • HelpContents
    • HelpOnMoinWikiSyntax
    • Display
    • Attachments
    • Info
    • Raw Text
    • Print View
    • Edit
    • Load
    • Save
  • Login

FrontPage

Upload page content

You can upload content for the page named below. If you change the page name, you can also upload content for another page. If the page name is empty, we derive the page name from the file name.

File to load page content from
Page name
Comment

AD Teaching Wiki:
  • BachelorAndMasterProjectsAndTheses
  • HomepageAnalyzer

Type: Interesting and well-defined problem. It will be relatively straightforward to get results of medium quality. The challenge is to achieve results of high quality. Machine learning will be crucial to achieve that. Exploring suitable methods is part of the challenge.

Goal: Extract a large number of scientist's homepages from the CommonCrawl web crawl. Extract the central information from these pages, including: name, profession, gender, affiliation.

Step 1: Download a large chunk of university webpages (e.g. all universities in Europe or in the US or even in the whole world).

Step 2: Design, implement and run a classifier that identifies people's homepages. A first version can be rule-based. For a classifier of high-quality machine learning will eventually be needed.

Step 3: Extract basic information, like the name, the gender, the profession, and the affiliation. A first version can be rule-based. For result with high precision and recall (which is the goal), machine learning will eventually be needed.

  • MoinMoin Powered
  • Python Powered
  • GPL licensed
  • Valid HTML 4.01