6797
Comment:
|
13211
|
Deletions are marked like this. | Additions are marked like this. |
Line 6: | Line 6: |
''Internal note to thesis supervisors: you need to make sure that each supervised thesis appears on our chair homepage. For this you need to: 1) put the final PDFs of the thesis and the presentation into /nfs/raid3/publications/theses - please follow the existing naming convention!; the PDFs will be available at [[http://ad-publications.informatik.uni-freiburg.de/theses/|http://ad-publications.informatik.uni-freiburg.de/theses/]]<PDF> 2) send a mail to Sabine with the following information: title of thesis, name of student, thesis abstract, link to thesis PDF, link to presentation PDF.'' |
|
Line 8: | Line 14: |
''Note to interested students: if all projects in this list are ongoing, this means that our current capacity for supervising projects is reached. Maybe, if you come back later, there will be an offer again. You can also propose a topic on your own.'' | ''Note to interested students: if all projects in this list are ongoing, this means that our current capacity for supervising projects and theses is reached. Maybe, if you come back later, there will be an offer again. You can also propose a topic on your own.'' |
Line 10: | Line 16: |
TODO: project + thesis offered/supervised by Björn | '''Deep Dive (project, available)''': [[http://deepdive.stanford.edu|DeepDive]] is a framework from the Stanford University that can construct a knowledge base (a database of facts like "Barack Obama" has-spouse "Michelle Obama") by analyzing large text corpora. It applies a large variety of state-of-the-art NLP techniques in distant-supervision and probabilistic inference. The project's goal would be 1) to evaluate the framework by applying it to a large text corpus 2) incorporate advanced methods or features to improve/enhance extraction quality. You should be confident in working in a Linux environment and you need practical experience in programming. ''supervised by [[http://ad.informatik.uni-freiburg.de/staff/haussmann|Elmar Haussmann]]'' |
Line 12: | Line 18: |
'''Information Extraction from Noun Compounds (thesis, available)''': Noun compounds like ''US president Barack Obama'' or ''3rd century Roman writer Censorinus'' can carry a lot of valuable information. Here: ''Barack Obama is president of the US'' and ''Censorinus is a roman writer from the 3d century''. Noun compounds have been studied from a linguistic point of view, e.g. http://people.ischool.berkeley.edu/~nakov/selected_papers_list/TSLP2013.pdf , but so far they have largely been ignored in information extraction tasks (i.e. generate triples of the expressed facts like (''Barack Obama'') (''is president of'') (''the US'')). The task of this thesis will be to research current state-of-the art approaches for understanding noun compounds and implement/adapt/improve them to extract information in triple-form. Good if you know something about machine learning but not absolutely necessary. ''supervised by Elmar Haussmann'' | '''Domain-Specific Semantic Full-Text Search (thesis, available)''': Our semantic search engine [[http://broccoli.cs.uni-freiburg.de|Broccoli]] answers structured, semantic queries. Queries can, in theory, be very complex and powerful. However, it is almost impossible for inexperienced users to build appropriate structured queries. There is an experimental feature to interpret queries in natural language and translate them to structured queries. Again, this is a difficult task we cannot solve perfectly (yet). With a limited vocabulary, this becomes much easier. Think of Facebook's Search and queries like "restaurants my friends like", "friends of my friends" - interpreting such queries is much easier than interpreting arbitrary questions on Wikipedia. We have created versions of Broccoli that work with medical texts, news articles, or patents. Usually, those texts introduce new challenges and make it even harder to provide a really good query experience. Goal of this thesis is to use such a domain or find a new one, solve the associated challenges as usual (with support by us), but this time also significantly reduce the features of our search and try to find a simple use-case and do that one thing really well. Ideally as a system that correctly answers a limited set of natual langauge questions. - ''supervised by [[http://ad.informatik.uni-freiburg.de/staff/buchhold|Björn Buchhold]]'' |
Line 14: | Line 20: |
'''Semantic Search Engine Prototype utilizing Distributional Semantics (project, available)''': Distributional semantics is a hot topic in natural language processing right now. It allows representing words (and phrases) as vectors in a multi-dimensional space such that words with similar meaning have low distance, e.g. "Friday" and "Saturday", "king" and "queen", or "Freiburg" and "Stuttgart". Tools to generate these vectors are freely available, e.g. from Google: https://code.google.com/p/word2vec/. The task will be to generate word vectors, assess their quality, and (depending on quality) implement a prototype search engine utilizing them, e.g. for resolving synonyms. Good if you heard the information retrieval lecture, but not absolutely necessary. ''supervised by Elmar Haussmann'' | '''High Level Tests and Performance Tracking for Broccoli (project, available)''': Our semantic search engine [[http://broccoli.cs.uni-freiburg.de|Broccoli]] currently features unit tests that are automatically build and run on commits to a central repository. However, there is no way to track the effect of changes on the system's performance, both, in terms of query execution time and result quality. The task of the project is to come up (in discussions with the people invloved with Broccoli) with set of queries and a way to visualize results that can be run on a regular basis. A major challenge arises because either changes to Broccoli or downloads of new input data require new indexes to be build. This task is computationally expensive and cannot be performed as often as building and running unit tests. Apart from that, variations in response times and results are tolerated and even expected every time. Still, it would be highly beneficial to be presented with an overview of changes between builds so that unexpected differences can be examined manually if necessary. - ''supervised by [[http://ad.informatik.uni-freiburg.de/staff/buchhold|Björn Buchhold]]'' |
Line 16: | Line 22: |
'''Semantic Search Engine Prototype for OpenIE (thesis or larger project, available)''': Open Information Extraction (OpenIE) extracts triples like (''Barack Obama'') (''is president of'') (''the US'') from natural language text. Recent OpenIE systems perform well, but it is yet to be shown that these triples can be used effectively for information retrieval tasks. First search engines already exist, e.g. http://openie.cs.washington.edu/. The task will be to investigate how to best search on OpenIE triples, implement a search engine prototype similar to the existing one and asses/evaluate it. You should be proficient in programming in C++ or Java and it is good if you heard the information retrieval lecture. ''supervised by Elmar Haussmann'' | '''Value Recognition in Full-Text (thesis, ONGOING)''': Text documents contain various values of different types. Dates, weights, lengths, heights and many more can be expressed obviously like ''13kg'', less obivously, e.g., ''ten tons'', and highly obfuscated, e.g., ''for the first half of the year''. For our semantic search engine [[http://broccoli.cs.uni-freiburg.de|Broccoli]], we do not make use of this inforamtion, yet. The taks of the thesis is to produce a ([[http://uima.apache.org/|UIMA]]) component that extracts such values from text documents, that serve as input for further steps to enable semantic search that matches such values. Additionally the findings should be evaluated. Facts which kinds of values are truely hard to extract and which are relatively easy, is very valuable knowledge as well. - ''supervised by [[http://ad.informatik.uni-freiburg.de/staff/buchhold|Björn Buchhold]]'' |
Line 18: | Line 24: |
TODO: project + thesis offered/supervised by Florian | '''Structured extraction of text from scientific PDF documents (master thesis, ONGOING)''' A lot of documents are only available in the PDF format which was originally intended for platform independent uniform display and printing. To process the contained information one has to extract the contained plain text. To be able to do this correctly one has to consider the formatting structure of the documents to be able to identify the parts of the text that belong together. The goal is to create such a non-trivial structured text-extraction on top of an available PDF library like, e.g., [[http://pdfbox.apache.org/|PDFBox]] so that the contained texts can be further processed by our search engine ... ''supervised by [[http://ad.informatik.uni-freiburg.de/staff/korzen|Claudius Korzen]]'' |
Line 20: | Line 26: |
TODO: project + thesis offered/supervised by Claudius | '''A mobile app for kitchen management (project, ONGOING)''' Our comfortable kitchen provides a lot of drinks and food. Whenever an employee consumes something, he is asked to mark it on our ''tally sheet'' and to pay the accumulated amount whenever he wants. In general, the noted amount is quite smaller than the actual amount to pay due to some uncertainties about the prices and due to the oblivion of our staff. Moreover, the payment behavior could be better in general. The goal of this project is to implement a mobile app, that can be used to manage the kitchen accounts of our staff. At least, it should be possible to scan the product before consuming and to debit the employee's account with the corresponding price. In certain intervals, a ''notice to pay'' should be sent to each employee. Can be expanded by any number of additional features ... ''supervised by [[http://ad.informatik.uni-freiburg.de/staff/korzen|Claudius Korzen]]'' |
Line 22: | Line 28: |
TODO: project + thesis offered/supervised by Sabine | |
Line 24: | Line 29: |
'''Question Answering using Semantic Full-Text Search (thesis, available)''': Use our semantic full-text search engine Broccoli for the FACTOID and LIST questions from one of the TREC Questions Answering benchmarks, e.g. http://trec.nist.gov/data/qa/2006_qadata/QA2006_testset.xml . What is the performance of manually constructed queries, and what can be done to improve it? Good if you heard the information retrieval lecture before, but not absolutely necessary ... ''supervised by Hannah Bast and TBA'' | |
Line 26: | Line 30: |
'''Improved Transfer Patterns Routing (thesis or larger project, available)''': Public transit routing at Google currently implements the transfer patterns algorithm from this paper: http://ad-publications.informatik.uni-freiburg.de/ESA_transferpatterns_BCEGHRV_2010.pdf . The evaluation in this paper is outdated, and we know of several possible major improvements in the meantime. In particular, we know how to accelerate the pre-processing time by a factor of about 10. The goal of this thesis would we to implement a much improved version of transfer patterns and properly benchmark it ... ''supervised by Hannah Bast and Sabine Storandt'' | '''Learning from OSM data (theses or larger projects, available)''' !OpenStreetMap (OSM) bears a lot of data that geo-search and render engines cannot use sufficiently at the moment. A variety of topics is available in that area, e.g. landscape classification based on street network data, learning classifiers for regions of interest (e.g. industrial areas), developing data structures to answer queries with specific geo-relations ('next to', 'south of', 'along'), automated tagging and classification of points of interest (e.g. lake -> place to swim), etc. ... ''supervised by [[http://ad.informatik.uni-freiburg.de/staff/storandt|Sabine Storandt]]'' |
Line 28: | Line 32: |
'''An easy-to-use web app for the !CompleteSearch engine (project, available)''': The !CompleteSearch engine complex search capabilities (including prefix search and faceted search) on semi-structured data (text and databases) with very fast query times. Our current software is powerful and flexible, but has quite a learning curve before it can be used. The goal of this project would be to set up a web app, where one can upload any CSV dataset, and then have a convenient search (with meaningful default settings), without having to set up anything oneself. This would be an extremely useful web application ... ''supervised by Hannah Bast'' | '''Setting up a browser for DBPedia with Broccoli (project, available)''' We derived a simplified version of the [[http://www.freebase.com/|Freebase]] ontology and set up a web application to browse the data based on our Broccoli search engine, called Freebase Easy: http://freebase-easy.cs.uni-freiburg.de . The goal of this project would be to do something similar with the [[http://dbpedia.org/|DBPedia]] ontology. This means modifying the freely downloadable dataset so that it 1) can be processed by our pipeline and 2) has a form that allows for a comfortable browsing with our user interface (e.g. providing unique and proper human readable names for instances and types) ... ''supervised by [[http://ad.informatik.uni-freiburg.de/staff/baeurlef|Florian Bäurle]]'' '''A keyword query translator for Broccoli (bachelor thesis, ONGOING)''' Our Broccoli search engine has its own special query language (SPARQL-like trees). Though the userinterface guides users in the incremental query creation process, many users got accustomed to simple keyword queries like they are used, e.g., when searching with Google. To make the interface more attractive for these users and for the general convenience of a quick query creation, it would be nice to have a mechanism that translates normal keyword queries into equivalent structured Broccoli queries. We already implemented an experimental mechanism that can make such translations (e.g., just try entering "mafia films by Francis Coppola" into the input field of the userinterface) as a proof of concept. The work of this thesis would be to implement a better, more powerful query translator ... ''supervised by [[http://ad.informatik.uni-freiburg.de/staff/baeurlef|Florian Bäurle]]'' '''Usability study for the Broccoli user interface (bachelor thesis, available)''' We developed a special user interface for the prototype of our semantic full-text search engine Broccoli: http://broccoli.cs.uni-freiburg.de . An open taks is yet to make a thorough user study to evaluate the usability of the interface, compare it to other search interfaces and identify weak points that could be improved ... ''supervised by [[http://ad.informatik.uni-freiburg.de/staff/baeurlef|Florian Bäurle]]'' '''Question Answering using Semantic Full-Text Search (thesis, available)''': Use our semantic full-text search engine Broccoli for the FACTOID and LIST questions from one of the TREC Questions Answering benchmarks, e.g. http://trec.nist.gov/data/qa/2006_qadata/QA2006_testset.xml . What is the performance of manually constructed queries, and what can be done to improve it? Good if you heard the information retrieval lecture before, but not absolutely necessary ... ''supervised by [[http://ad.informatik.uni-freiburg.de/staff/bast|Hannah Bast]] and [Elmar or Björn]'' '''An easy-to-use web app for the !CompleteSearch engine (project, available)''': The !CompleteSearch engine complex search capabilities (including prefix search and faceted search) on semi-structured data (text and databases) with very fast query times. Our current software is powerful and flexible, but has quite a learning curve before it can be used. The goal of this project would be to set up a web app, where one can upload any CSV dataset, and then have a convenient search (with meaningful default settings), without having to set up anything oneself. This would be an extremely useful web application ... ''supervised by [[http://ad.informatik.uni-freiburg.de/staff/bast|Hannah Bast]]'' |
Line 33: | Line 47: |
B.Sc. thesis Manuel Ruder (Elmar) M.sc. thesis Niklas Meinzer (Sabine) B.Sc. thesis Christian Ehrenfeld (Hannah) |
Internal note for our chair: each of us should offer / provide 12 months of supervision every year. Each thesis or project "counts" just like it counts for the students, where #months = #ECTS / 4. That is, B.Sc. project = 6 ECTS = 1.5 months, M.Sc. project = 16 ECTS = 4 months, B.Sc. thesis = 12 ECTS = 3 months, M.Sc. thesis = 25 ECTS = 6 months.
Internal note to thesis supervisors: you need to make sure that each supervised thesis appears on our chair homepage. For this you need to: 1) put the final PDFs of the thesis and the presentation into /nfs/raid3/publications/theses - please follow the existing naming convention!; the PDFs will be available at http://ad-publications.informatik.uni-freiburg.de/theses/<PDF> 2) send a mail to Sabine with the following information: title of thesis, name of student, thesis abstract, link to thesis PDF, link to presentation PDF.
List of available and ongoing topics for current B.Sc. and M.Sc. projects and theses
Note to interested students: if all projects in this list are ongoing, this means that our current capacity for supervising projects and theses is reached. Maybe, if you come back later, there will be an offer again. You can also propose a topic on your own.
Deep Dive (project, available): DeepDive is a framework from the Stanford University that can construct a knowledge base (a database of facts like "Barack Obama" has-spouse "Michelle Obama") by analyzing large text corpora. It applies a large variety of state-of-the-art NLP techniques in distant-supervision and probabilistic inference. The project's goal would be 1) to evaluate the framework by applying it to a large text corpus 2) incorporate advanced methods or features to improve/enhance extraction quality. You should be confident in working in a Linux environment and you need practical experience in programming. supervised by Elmar Haussmann
Domain-Specific Semantic Full-Text Search (thesis, available): Our semantic search engine Broccoli answers structured, semantic queries. Queries can, in theory, be very complex and powerful. However, it is almost impossible for inexperienced users to build appropriate structured queries. There is an experimental feature to interpret queries in natural language and translate them to structured queries. Again, this is a difficult task we cannot solve perfectly (yet). With a limited vocabulary, this becomes much easier. Think of Facebook's Search and queries like "restaurants my friends like", "friends of my friends" - interpreting such queries is much easier than interpreting arbitrary questions on Wikipedia. We have created versions of Broccoli that work with medical texts, news articles, or patents. Usually, those texts introduce new challenges and make it even harder to provide a really good query experience. Goal of this thesis is to use such a domain or find a new one, solve the associated challenges as usual (with support by us), but this time also significantly reduce the features of our search and try to find a simple use-case and do that one thing really well. Ideally as a system that correctly answers a limited set of natual langauge questions. - supervised by Björn Buchhold
High Level Tests and Performance Tracking for Broccoli (project, available): Our semantic search engine Broccoli currently features unit tests that are automatically build and run on commits to a central repository. However, there is no way to track the effect of changes on the system's performance, both, in terms of query execution time and result quality. The task of the project is to come up (in discussions with the people invloved with Broccoli) with set of queries and a way to visualize results that can be run on a regular basis. A major challenge arises because either changes to Broccoli or downloads of new input data require new indexes to be build. This task is computationally expensive and cannot be performed as often as building and running unit tests. Apart from that, variations in response times and results are tolerated and even expected every time. Still, it would be highly beneficial to be presented with an overview of changes between builds so that unexpected differences can be examined manually if necessary. - supervised by Björn Buchhold
Value Recognition in Full-Text (thesis, ONGOING): Text documents contain various values of different types. Dates, weights, lengths, heights and many more can be expressed obviously like 13kg, less obivously, e.g., ten tons, and highly obfuscated, e.g., for the first half of the year. For our semantic search engine Broccoli, we do not make use of this inforamtion, yet. The taks of the thesis is to produce a (UIMA) component that extracts such values from text documents, that serve as input for further steps to enable semantic search that matches such values. Additionally the findings should be evaluated. Facts which kinds of values are truely hard to extract and which are relatively easy, is very valuable knowledge as well. - supervised by Björn Buchhold
Structured extraction of text from scientific PDF documents (master thesis, ONGOING) A lot of documents are only available in the PDF format which was originally intended for platform independent uniform display and printing. To process the contained information one has to extract the contained plain text. To be able to do this correctly one has to consider the formatting structure of the documents to be able to identify the parts of the text that belong together. The goal is to create such a non-trivial structured text-extraction on top of an available PDF library like, e.g., PDFBox so that the contained texts can be further processed by our search engine ... supervised by Claudius Korzen
A mobile app for kitchen management (project, ONGOING) Our comfortable kitchen provides a lot of drinks and food. Whenever an employee consumes something, he is asked to mark it on our tally sheet and to pay the accumulated amount whenever he wants. In general, the noted amount is quite smaller than the actual amount to pay due to some uncertainties about the prices and due to the oblivion of our staff. Moreover, the payment behavior could be better in general. The goal of this project is to implement a mobile app, that can be used to manage the kitchen accounts of our staff. At least, it should be possible to scan the product before consuming and to debit the employee's account with the corresponding price. In certain intervals, a notice to pay should be sent to each employee. Can be expanded by any number of additional features ... supervised by Claudius Korzen
Learning from OSM data (theses or larger projects, available) OpenStreetMap (OSM) bears a lot of data that geo-search and render engines cannot use sufficiently at the moment. A variety of topics is available in that area, e.g. landscape classification based on street network data, learning classifiers for regions of interest (e.g. industrial areas), developing data structures to answer queries with specific geo-relations ('next to', 'south of', 'along'), automated tagging and classification of points of interest (e.g. lake -> place to swim), etc. ... supervised by Sabine Storandt
Setting up a browser for DBPedia with Broccoli (project, available) We derived a simplified version of the Freebase ontology and set up a web application to browse the data based on our Broccoli search engine, called Freebase Easy: http://freebase-easy.cs.uni-freiburg.de . The goal of this project would be to do something similar with the DBPedia ontology. This means modifying the freely downloadable dataset so that it 1) can be processed by our pipeline and 2) has a form that allows for a comfortable browsing with our user interface (e.g. providing unique and proper human readable names for instances and types) ... supervised by Florian Bäurle
A keyword query translator for Broccoli (bachelor thesis, ONGOING) Our Broccoli search engine has its own special query language (SPARQL-like trees). Though the userinterface guides users in the incremental query creation process, many users got accustomed to simple keyword queries like they are used, e.g., when searching with Google. To make the interface more attractive for these users and for the general convenience of a quick query creation, it would be nice to have a mechanism that translates normal keyword queries into equivalent structured Broccoli queries. We already implemented an experimental mechanism that can make such translations (e.g., just try entering "mafia films by Francis Coppola" into the input field of the userinterface) as a proof of concept. The work of this thesis would be to implement a better, more powerful query translator ... supervised by Florian Bäurle
Usability study for the Broccoli user interface (bachelor thesis, available) We developed a special user interface for the prototype of our semantic full-text search engine Broccoli: http://broccoli.cs.uni-freiburg.de . An open taks is yet to make a thorough user study to evaluate the usability of the interface, compare it to other search interfaces and identify weak points that could be improved ... supervised by Florian Bäurle
Question Answering using Semantic Full-Text Search (thesis, available): Use our semantic full-text search engine Broccoli for the FACTOID and LIST questions from one of the TREC Questions Answering benchmarks, e.g. http://trec.nist.gov/data/qa/2006_qadata/QA2006_testset.xml . What is the performance of manually constructed queries, and what can be done to improve it? Good if you heard the information retrieval lecture before, but not absolutely necessary ... supervised by Hannah Bast and [Elmar or Björn]
An easy-to-use web app for the CompleteSearch engine (project, available): The CompleteSearch engine complex search capabilities (including prefix search and faceted search) on semi-structured data (text and databases) with very fast query times. Our current software is powerful and flexible, but has quite a learning curve before it can be used. The goal of this project would be to set up a web app, where one can upload any CSV dataset, and then have a convenient search (with meaningful default settings), without having to set up anything oneself. This would be an extremely useful web application ... supervised by Hannah Bast
Completed B.Sc. or M.Sc. projects and theses
[Die Liste ist noch unvollständig, insbesondere fehlen gerade noch die ganzen Projekte. Titel der Arbeit sollte da auch noch stehen und ein Link zur jeweiligen Webseite bzw. Arbeit und Präsentation. Und das Anfangs- und Enddatum.]
B.Sc. thesis Manuel Ruder (Elmar)
M.sc. thesis Niklas Meinzer (Sabine)
B.Sc. thesis Christian Ehrenfeld (Hannah)
B.Sc. thesis Philipp Bausch (Elmar)
M.Sc. thesis Eugen Sawin (Hannah)
M.Sc. thesis Patrick Brosi (Hannah + Sabine)
B.Sc. thesis Marius Bethge (Björn)
M.Sc. thesis Cynthia Jimenez (Sabine)
M.Sc. thesis Jonas Sternisko (Hannah)
M.Sc. thesis Ragavan Natarajan (Florian)
B.Sc. thesis Benjamin Meier (Claudius)
M.Sc. thesis Susanne Eichel (Hannah)
B.Sc. thesis Axel Lehmann (Hannah)
B.Sc. thesis Adrian Batzill (Hannah)
B.Sc. thesis Anton Stepan (Björn)
M.Sc. thesis Mirko Brodesser (Hannah + Sabine)
M.Sc. thesis Manuel Braun (Hannah + Sabine)
B.Sc. thesis Robin Schirrmeister (Hannah)
B.Sc. thesis Simon Skilevic (Hannah)
M.Sc. thesis Ilinca Tudose (Hannah + Elmar)
B.Sc. thesis Christiane Schaffer (Florian)
M.Sc. thesis Dirk Kienle (Hannah)
B.Sc. thesis Ina Baumgarten (Hannah + Björn)
B.Sc. thesis Niklas Meinzer (Hannah + Björn)
M.Sc. thesis Claudius Korzen (Hannah)
M.Sc. thesis Florian Bäurle (Hannah)
M.Sc. thesis Elmar Haußmann (Hannah)
M.Sc. thesis Oliver Mitevski (Hannah + Marjan)
M.Sc. thesis Björn Buchhold (Hannah)
Diploma thesis Johannes Schwenk (Hannah)
B.Sc. thesis Mirko Brodesser (Hannah + Marjan)