4157
Comment:
|
10384
|
Deletions are marked like this. | Additions are marked like this. |
Line 5: | Line 5: |
''Internal note to thesis supervisors: you need to make sure that each supervised thesis appears on our chair homepage. For this you need to: 1) put the final PDF of the thesis and the presentation into /nfs/raid3/publications/theses on atlantis - please follow the existing naming convention!; the PDF will be available at http://ad-publications.informatik.uni-freiburg.de/theses/<PDF> 2) send a mail to Sabine with the following information: title of thesis, name of student, thesis abstract, link to thesis PDF, link to presentation PDF.'' |
|
Line 12: | Line 14: |
TODO: project + thesis offered/supervised by Elmar | '''Information Extraction from Noun Compounds (thesis, available)''': Noun compounds like ''US president Barack Obama'' or ''3rd century Roman writer Censorinus'' can carry a lot of valuable information. Here: ''Barack Obama is president of the US'' and ''Censorinus is a roman writer from the 3d century''. Noun compounds have been studied from a linguistic point of view, e.g. http://people.ischool.berkeley.edu/~nakov/selected_papers_list/TSLP2013.pdf , but so far they have largely been ignored in information extraction tasks (i.e. generate triples of the expressed facts like (''Barack Obama'') (''is president of'') (''the US'')). The task of this thesis will be to research current state-of-the art approaches for understanding noun compounds and implement/adapt/improve them to extract information in triple-form. Good if you know something about machine learning but not absolutely necessary ''... supervised by Elmar Haussmann'' '''Semantic Search Engine Prototype utilizing OpenIE (thesis or larger project, available)''': Open Information Extraction (OpenIE) extracts triples like (''Barack Obama'') (''is president of'') (''the US'') from natural language text. Recent OpenIE systems perform well, but it is yet to be shown that these triples can be used effectively for information retrieval tasks. First systems searching on the extracted triples already exist, e.g. http://openie.cs.washington.edu/. The task will be to investigate how to best search on OpenIE triples, implement a search engine prototype similar to the existing one and assess/evaluate it. You should be proficient in programming in C++ or Java and it is good if you heard the information retrieval lecture ''... supervised by Elmar Haussmann and TBA'' '''Distributional Semantics for a Semantic Search Engine Prototype (project, available)''': Distributional semantics is a hot topic in natural language processing right now. It allows representing words (and phrases) as vectors in a high-dimensional space such that words with similar meaning have low distance, e.g. "Friday" and "Saturday", "king" and "queen", or "Freiburg" and "Stuttgart". Tools to generate these word space modals are freely available, e.g. from Google: https://code.google.com/p/word2vec/. The task will be to generate word vectors, assess their quality, and (depending on quality) implement a prototype search engine utilizing them, e.g. for resolving synonyms. Good if you heard the information retrieval lecture, but not absolutely necessary ''... supervised by Elmar Haussmann'' |
Line 16: | Line 24: |
TODO: project + thesis offered/supervised by Claudius | '''A HTML5-based PDF-annotation & -extraction tool (theses or larger projects, available)''' The goal is to implement an easy-to-use PDF-tool for [[http://ad-publications.informatik.uni-freiburg.de/WISE_icecite_BK_2013.pdf|Icecite]], that runs in browser's client side with the following features: (1) Text extraction (along with meta-informations like fontsizes, fontnames, text-positions, etc.) from PDF-documents - similar to the (Java-based) extraction-tool of [[http://pdfbox.apache.org/PDFBox|PDFBox]]. (2) Annotating (i.e highlighting and commenting) PDF-documents, as provided by Adobe Professional. However, the Adobe browser plugin isn't available in all browsers and isn't quite comfortable to annotate PDF's in the browsers. For both tasks, you are allowed to use existing PDF-parsers and -renderers, e.g. [[http://mozilla.github.io/pdf.js/|PDF.js]]. ''... supervised by Claudius Korzen'' |
Line 18: | Line 29: |
TODO: project + thesis offered/supervised by Sabine | '''Rounding on Paths (bachelor thesis, available)''' Given a graph and a cost function on the edges (e.g. height difference in mm), we want to reduce the precision of the cost values by rounding (saving space and computation time for algorithms to be run on that graph). The goal is to develop rounding techniques such that the rounding error on every (optimal) path can be bounded. ''supervised by Sabine Storandt'' |
Line 20: | Line 31: |
'''Question Answering using Semantic Full-Text Search (thesis, HB + ...)''': Use our semantic full-text search engine Broccoli for the FACTOID and LIST questions from one of the TREC Questions Answering benchmarks, e.g. http://trec.nist.gov/data/qa/2006_qadata/QA2006_testset.xml . What is the performance of manually constructed queries, and what can be done to improve it? Good if you heard the information retrieval lecture before, but not absolutely necessary. | '''Learning from OSM data (theses or larger projects, available)''' OpenStreetMap(OSM) bears a lot of data that geo-search and render engines cannot use sufficiently at the moment. A variety of topics is available in that area, e.g. landscape classification based on street network data, learning classifiers for regions of interest (e.g. industrial areas), developing data structures to answer queries with specific geo-relations ('next to', 'south of', 'along'), automated tagging and classification of points of interest (e.g. lake -> place to swim), etc. ''supervised by Sabine Storandt'' |
Line 22: | Line 33: |
'''Improved Transfer Patterns Routing (thesis or project, HB + SS)''': Public transit routing at Google currently implements the transfer patterns algorithm from this paper: http://ad-publications.informatik.uni-freiburg.de/ESA_transferpatterns_BCEGHRV_2010.pdf . The evaluation in this paper is outdated, and we know of several possible major improvements in the meantime. In particular, we know how to accelerate the pre-processing time by a factor of about 10. The goal of this thesis would we to implement a much improved version of transfer patterns and properly benchmark it. | '''Local Zoom (thesis, available)''' Given a street graph projected on a map and a general zoom-in functionality (think of a renderer like Google Maps). We want to allow specifying subregions of the actual view, where more detail is required. The goal is to enable local zoom, such that basic connectivity of important streets is preserved and no overlap of non-crossing streets is introduced. This functionality should be integrated in our existing render-framework. ''supervised by Sabine Storandt'' |
Line 24: | Line 35: |
'''An easy-to-use web app for the CompleteSearch engine (project, HB)''': The CompleteSearch engine complex search capabilities (including prefix search and faceted search) on semi-structured data (text and databases) with very fast query times. Our current software is powerful and flexible, but has quite a learning curve before it can be used. The goal of this project would be to set up a web app, where one can upload any CSV dataset, and then have a convenient search (with meaningful default settings), without having to set up anything oneself. | '''Dynamic Ambulance Allocation (master thesis, ongoing)''' We are given a street graph, an embedded set of hospitals, an ambulance fleet and an online stream of incoming patient requests. The goal is to develop strategies for dynamically allocating ambulances to hospitals and emergencies, such that the number of saved patients is maximized. ''supervised by Sabine Storandt'' '''Compression Techniques for Shortest Path Sets (bachelor thesis, ongoing)''' Given a large collection of shortest paths in a graph, develop techniques to store them as space-efficient as possible, while allowing fast access to single paths. ''supervised by Sabine Storandt'' '''Question Answering using Semantic Full-Text Search (thesis, available)''': Use our semantic full-text search engine Broccoli for the FACTOID and LIST questions from one of the TREC Questions Answering benchmarks, e.g. http://trec.nist.gov/data/qa/2006_qadata/QA2006_testset.xml . What is the performance of manually constructed queries, and what can be done to improve it? Good if you heard the information retrieval lecture before, but not absolutely necessary ... ''supervised by Hannah Bast and TBA'' '''Improved Transfer Patterns Routing (thesis or larger project, available)''': Public transit routing at Google currently implements the transfer patterns algorithm from this paper: http://ad-publications.informatik.uni-freiburg.de/ESA_transferpatterns_BCEGHRV_2010.pdf . The evaluation in this paper is outdated, and we know of several possible major improvements in the meantime. In particular, we know how to accelerate the pre-processing time by a factor of about 10. The goal of this thesis would we to implement a much improved version of transfer patterns and properly benchmark it ... ''supervised by Hannah Bast and Sabine Storandt'' '''An easy-to-use web app for the !CompleteSearch engine (project, available)''': The !CompleteSearch engine complex search capabilities (including prefix search and faceted search) on semi-structured data (text and databases) with very fast query times. Our current software is powerful and flexible, but has quite a learning curve before it can be used. The goal of this project would be to set up a web app, where one can upload any CSV dataset, and then have a convenient search (with meaningful default settings), without having to set up anything oneself. This would be an extremely useful web application ... ''supervised by Hannah Bast'' |
Internal note for our chair: each of us should offer / provide 12 months of supervision every year. Each thesis or project "counts" just like it counts for the students, where #months = #ECTS / 4. That is, B.Sc. project = 6 ECTS = 1.5 months, M.Sc. project = 16 ECTS = 4 months, B.Sc. thesis = 12 ECTS = 3 months, M.Sc. thesis = 25 ECTS = 6 months.
Internal note to thesis supervisors: you need to make sure that each supervised thesis appears on our chair homepage. For this you need to: 1) put the final PDF of the thesis and the presentation into /nfs/raid3/publications/theses on atlantis - please follow the existing naming convention!; the PDF will be available at http://ad-publications.informatik.uni-freiburg.de/theses/<PDF> 2) send a mail to Sabine with the following information: title of thesis, name of student, thesis abstract, link to thesis PDF, link to presentation PDF.
List of available and ongoing topics for current B.Sc. and M.Sc. projects and theses
Note to interested students: if all projects in this list are ongoing, this means that our current capacity for supervising projects is reached. Maybe, if you come back later, there will be an offer again. You can also propose a topic on your own.
TODO: project + thesis offered/supervised by Björn
Information Extraction from Noun Compounds (thesis, available): Noun compounds like US president Barack Obama or 3rd century Roman writer Censorinus can carry a lot of valuable information. Here: Barack Obama is president of the US and Censorinus is a roman writer from the 3d century. Noun compounds have been studied from a linguistic point of view, e.g. http://people.ischool.berkeley.edu/~nakov/selected_papers_list/TSLP2013.pdf , but so far they have largely been ignored in information extraction tasks (i.e. generate triples of the expressed facts like (Barack Obama) (is president of) (the US)). The task of this thesis will be to research current state-of-the art approaches for understanding noun compounds and implement/adapt/improve them to extract information in triple-form. Good if you know something about machine learning but not absolutely necessary ... supervised by Elmar Haussmann
Semantic Search Engine Prototype utilizing OpenIE (thesis or larger project, available): Open Information Extraction (OpenIE) extracts triples like (Barack Obama) (is president of) (the US) from natural language text. Recent OpenIE systems perform well, but it is yet to be shown that these triples can be used effectively for information retrieval tasks. First systems searching on the extracted triples already exist, e.g. http://openie.cs.washington.edu/. The task will be to investigate how to best search on OpenIE triples, implement a search engine prototype similar to the existing one and assess/evaluate it. You should be proficient in programming in C++ or Java and it is good if you heard the information retrieval lecture ... supervised by Elmar Haussmann and TBA
Distributional Semantics for a Semantic Search Engine Prototype (project, available): Distributional semantics is a hot topic in natural language processing right now. It allows representing words (and phrases) as vectors in a high-dimensional space such that words with similar meaning have low distance, e.g. "Friday" and "Saturday", "king" and "queen", or "Freiburg" and "Stuttgart". Tools to generate these word space modals are freely available, e.g. from Google: https://code.google.com/p/word2vec/. The task will be to generate word vectors, assess their quality, and (depending on quality) implement a prototype search engine utilizing them, e.g. for resolving synonyms. Good if you heard the information retrieval lecture, but not absolutely necessary ... supervised by Elmar Haussmann
TODO: project + thesis offered/supervised by Florian
A HTML5-based PDF-annotation & -extraction tool (theses or larger projects, available) The goal is to implement an easy-to-use PDF-tool for Icecite, that runs in browser's client side with the following features: (1) Text extraction (along with meta-informations like fontsizes, fontnames, text-positions, etc.) from PDF-documents - similar to the (Java-based) extraction-tool of PDFBox. (2) Annotating (i.e highlighting and commenting) PDF-documents, as provided by Adobe Professional. However, the Adobe browser plugin isn't available in all browsers and isn't quite comfortable to annotate PDF's in the browsers. For both tasks, you are allowed to use existing PDF-parsers and -renderers, e.g. PDF.js. ... supervised by Claudius Korzen
Rounding on Paths (bachelor thesis, available) Given a graph and a cost function on the edges (e.g. height difference in mm), we want to reduce the precision of the cost values by rounding (saving space and computation time for algorithms to be run on that graph). The goal is to develop rounding techniques such that the rounding error on every (optimal) path can be bounded. supervised by Sabine Storandt
Learning from OSM data (theses or larger projects, available) OpenStreetMap(OSM) bears a lot of data that geo-search and render engines cannot use sufficiently at the moment. A variety of topics is available in that area, e.g. landscape classification based on street network data, learning classifiers for regions of interest (e.g. industrial areas), developing data structures to answer queries with specific geo-relations ('next to', 'south of', 'along'), automated tagging and classification of points of interest (e.g. lake -> place to swim), etc. supervised by Sabine Storandt
Local Zoom (thesis, available) Given a street graph projected on a map and a general zoom-in functionality (think of a renderer like Google Maps). We want to allow specifying subregions of the actual view, where more detail is required. The goal is to enable local zoom, such that basic connectivity of important streets is preserved and no overlap of non-crossing streets is introduced. This functionality should be integrated in our existing render-framework. supervised by Sabine Storandt
Dynamic Ambulance Allocation (master thesis, ongoing) We are given a street graph, an embedded set of hospitals, an ambulance fleet and an online stream of incoming patient requests. The goal is to develop strategies for dynamically allocating ambulances to hospitals and emergencies, such that the number of saved patients is maximized. supervised by Sabine Storandt
Compression Techniques for Shortest Path Sets (bachelor thesis, ongoing) Given a large collection of shortest paths in a graph, develop techniques to store them as space-efficient as possible, while allowing fast access to single paths. supervised by Sabine Storandt
Question Answering using Semantic Full-Text Search (thesis, available): Use our semantic full-text search engine Broccoli for the FACTOID and LIST questions from one of the TREC Questions Answering benchmarks, e.g. http://trec.nist.gov/data/qa/2006_qadata/QA2006_testset.xml . What is the performance of manually constructed queries, and what can be done to improve it? Good if you heard the information retrieval lecture before, but not absolutely necessary ... supervised by Hannah Bast and TBA
Improved Transfer Patterns Routing (thesis or larger project, available): Public transit routing at Google currently implements the transfer patterns algorithm from this paper: http://ad-publications.informatik.uni-freiburg.de/ESA_transferpatterns_BCEGHRV_2010.pdf . The evaluation in this paper is outdated, and we know of several possible major improvements in the meantime. In particular, we know how to accelerate the pre-processing time by a factor of about 10. The goal of this thesis would we to implement a much improved version of transfer patterns and properly benchmark it ... supervised by Hannah Bast and Sabine Storandt
An easy-to-use web app for the CompleteSearch engine (project, available): The CompleteSearch engine complex search capabilities (including prefix search and faceted search) on semi-structured data (text and databases) with very fast query times. Our current software is powerful and flexible, but has quite a learning curve before it can be used. The goal of this project would be to set up a web app, where one can upload any CSV dataset, and then have a convenient search (with meaningful default settings), without having to set up anything oneself. This would be an extremely useful web application ... supervised by Hannah Bast
Completed B.Sc. or M.Sc. projects and theses
[Die Liste ist noch unvollständig, insbesondere fehlen gerade noch die ganzen Projekte. Titel der Arbeit sollte da auch noch stehen und ein Link zur jeweiligen Webseite bzw. Arbeit und Präsentation. Und das Anfangs- und Enddatum.]
B.Sc. thesis Philipp Bausch (Elmar)
M.Sc. thesis Eugen Sawin (Hannah)
M.Sc. thesis Patrick Brosi (Hannah + Sabine)
B.Sc. thesis Marius Bethge (Björn)
M.Sc. thesis Cynthia Jimenez (Sabine)
M.Sc. thesis Jonas Sternisko (Hannah)
M.Sc. thesis Ragavan Natarajan (Florian)
B.Sc. thesis Benjamin Meier (Claudius)
M.Sc. thesis Susanne Eichel (Hannah)
B.Sc. thesis Axel Lehmann (Hannah)
B.Sc. thesis Adrian Batzill (Hannah)
B.Sc. thesis Anton Stepan (Björn)
M.Sc. thesis Mirko Brodesser (Hannah + Sabine)
M.Sc. thesis Manuel Braun (Hannah + Sabine)
B.Sc. thesis Robin Schirrmeister (Hannah)
B.Sc. thesis Simon Skilevic (Hannah)
M.Sc. thesis Ilinca Tudose (Hannah + Elmar)
B.Sc. thesis Christiane Schaffer (Florian)
M.Sc. thesis Dirk Kienle (Hannah)
B.Sc. thesis Ina Baumgarten (Hannah + Björn)
B.Sc. thesis Niklas Meinzer (Hannah + Björn)
M.Sc. thesis Claudius Korzen (Hannah)
M.Sc. thesis Florian Bäurle (Hannah)
M.Sc. thesis Elmar Haußmann (Hannah)
M.Sc. thesis Oliver Mitevski (Hannah + Marjan)
M.Sc. thesis Björn Buchhold (Hannah)
Diploma thesis Johannes Schwenk (Hannah)
B.Sc. thesis Mirko Brodesser (Hannah + Marjan)