25480
Comment:
|
27234
|
Deletions are marked like this. | Additions are marked like this. |
Line 1: | Line 1: |
#acl Claudius Korzen:read,write Patrick Brosi:read,write Axel Lehmann:read,write Björn Buchhold:read,write Niklas Schnelle:read,write Markus Näther:read,write All:read | #acl Claudius Korzen:read,write Patrick Brosi:read,write Axel Lehmann:read,write Natalie Prange:read,write All:read |
Line 5: | Line 5: |
A list of available, ongoing and completed projects and theses can be found [[ListOfProjectAndThesisTopics|here]]. | |
Line 16: | Line 17: |
3. A transcript of the grades of the courses you have take so far. | 3. A transcript of the grades of the courses you have taken so far. |
Line 20: | Line 21: |
'''If you want to do your thesis with a company''', please provide the following *additional* information in a concise format. Concise means that you should not write more than a paragraph for each of the items below and that your text should be concrete and understandable for a non-expert. | '''If you want to do your thesis with a company or another department''', please provide the following *additional* information in a concise format. Concise means that you should not write more than a paragraph for each of the items below and that your text should be concrete and understandable for a non-expert. The purpose of this information is so that we can check whether the planned work has sufficient scientific merit with respect to the field of computer science. |
Line 23: | Line 24: |
a. How does your approach and the expected result differ from the state of the art? See the section on "Related Work" in our [[ThesesGuidelines|guidelines]]. a. How do you plan to evaluate your work. See the sections on "Theoretical Analysis" and "Empirical Analysis" in our [[ThesesGuidelines|guidelines]]. a. Supervision must be provided by the company and the supervisor should provide an evaluation report at the end of the thesis, in a format to be discussed with us. = Reproducibility = For both projects and theses, '''your results must be easily reproducible'''. We have very specific requirements for this. They are explained in detail on our [[Reproducibility]] page, together with a simple working example. Note the word "easily" in the previous paragraph. It is important that we can reproduce your results not just in principle, but easily. That is, there should be no need for lengthy explanations or expert knowledge, but once the software runs, everything should be "self-explanatory". From our perspective this should look like this: 1. We build the docker image (the comments at the end of the Dockerfile should say how)<<BR>> 2. We start a docker container (the comments at the end of the Dockerfile should say how)<<BR>> 3. There is information on the console on how to proceed further<<BR>> 4. Whatever components you provide from here on, they should be self-explanatory Concerning Item 3, this will look different depending on your type of work. If your docker container starts a web service, there should at least be output on the console which provides the URL of the service. Further explanations on the service can then be provided on the web page. If your docker container starts an interactive shell, there should at least be instructions on how to proceed further. In the simplest case, this can be a message ''Type make all to get help on the available options of how to proceed further''. Always keep in mind what is the point of all this. The point is that someone else, who is maybe not familiar with all or any of the details of your work, can see and reproduce what you have done easily, without assistance from you. And not just now, but also in six months or in a year or in five years. Also note that that "someone else" might be you in the future. It is amazing how quickly one forgets about ones own work, and you will be pleasantly surprised if you find that you can still run and understand everything months or years later. |
a. How does your approach and the expected result differ from the state of the art? See the section on "Related Work" in our [[WritingGuidelines|guidelines]]. a. How do you plan to evaluate your work. See the sections on "Theoretical Analysis" and "Empirical Analysis" in our [[WritingGuidelines|guidelines]]. a. Supervision must be provided by the company or the other department and the supervisor should provide an evaluation report at the end of the thesis, in a format to be discussed with us. |
Line 44: | Line 31: |
1. Fulfill the reproducibility requirements described in the previous section | 1. Fulfill the reproducibility requirements described in the next section |
Line 48: | Line 35: |
2.2 The website should not require any installation or background service to work<<BR>> 2.3 Note that this does not exclude interactive elements (e.g. via !JavaScript)<<BR>> 2.3 There are no strict requirements for the format of the page<<BR>> 2.5 However, it should be well-structured, informative and pleasant to read<<BR>> 2.6 Here is a long (incomplete) list of [[https://ad.informatik.uni-freiburg.de/publikationen/bachelor_master_projekte|examples projects]] which have already been completed at our chair |
2.2 There should be a single MD file in blog post format in ''www/content/post''<<BR>> 2.3 Any images or other files included in the MD file should be in ''www/static''<<BR>> 2.4 Note that this does not exclude interactive elements (e.g. via !JavaScript)<<BR>> 2.5 You can test the final appearance of the website as [[https://github.com/ad-freiburg/ad-blog|described here]]<<BR>> 2.6 You can look at [[http://ad-blog.informatik.uni-freiburg.de/|websites of previous projects]] in this format<<BR>> 2.7 Here is a long list of [[https://ad.informatik.uni-freiburg.de/publikationen/bachelor_master_projekte|previous projects]] in a format predating the MD format<<BR>> |
Line 59: | Line 47: |
1. Fulfill the reproducibility requirements described in the previous section | 1. Fulfill the reproducibility requirements described in the next section |
Line 64: | Line 52: |
2.3 Check our [[http://ad-wiki.informatik.uni-freiburg.de/teaching/ThesesGuidelines|Guidelines for how to write a proper thesis]]<<BR>> | 2.3 Check our [[http://ad-wiki.informatik.uni-freiburg.de/teaching/WritingGuidelines|Guidelines for how to write a proper thesis]]<<BR>> |
Line 78: | Line 66: |
= Reproducibility = For both projects and theses, '''your results must be easily reproducible'''. We have very specific requirements for this. They are explained in detail on our [[Reproducibility]] page, together with a simple working example. Note the word "easily" in the previous paragraph. It is important that we can reproduce your results not just in principle, but easily. That is, there should be no need for lengthy explanations or expert knowledge, but once the software runs, everything should be "self-explanatory". From our perspective this should look like this: 1. We build the docker image (the comments at the end of the Dockerfile should say how)<<BR>> 2. We start a docker container (the comments at the end of the Dockerfile should say how)<<BR>> 3. There is information on the console on how to proceed further<<BR>> 4. Whatever components you provide from here on, they should be self-explanatory Concerning Item 3, this will look different depending on your type of work. If your docker container starts a web service, there should at least be output on the console which provides the URL of the service. Further explanations on the service can then be provided on the web page. If your docker container starts an interactive shell, there should at least be instructions on how to proceed further. In the simplest case, this can be a message ''Type make all to get help on the available options of how to proceed further''. Always keep in mind what is the point of all this. The point is that someone else, who is maybe not familiar with all or any of the details of your work, can see and reproduce what you have done easily, without assistance from you. And not just now, but also in six months or in a year or in five years. Also note that that "someone else" might be you in the future. It is amazing how quickly one forgets about ones own work, and you will be pleasantly surprised if you find that you can still run and understand everything months or years later. |
|
Line 89: | Line 94: |
Informatik-Account: [usually first seven letters of given name + the initial of first name] | Informatik-Account: [usually first seven letters of last name + the initial of first name] |
Line 119: | Line 124: |
= Forum for asking questions = We have a dedicated forum for asking questions related to the technical aspects of our project or thesis: https://daphne.informatik.uni-freiburg.de/forum/viewforum.php?f=1082 It's similar to the forums we have for our courses. The forum is organized into several subforums, including: Programming, Docker, Datasets, Writing and Presenting, Miscellaneous. All members from the Algorithms and Data Structures group are subscribed and will be happy to help whenever they can. Please write us an email if you have asked a question on the forum and haven't received an answer for several days. Other students (including you) can also subscribe to the forum and they are very welcome to answer questions, too. Like in our courses, many questions are relevant and interesting for other students, too. In principle, feel free to ask any question on the forum. But for questions strongly related to your project and which probably only your supervisor can answer, better ask your supervisor directly. |
|
Line 138: | Line 153: |
1. Quality of the write-up. This includes the aspects described in our [[ThesesGuidelines|guidelines for writing a thesis]] | 1. Quality of the write-up. This includes the aspects described in our [[WritingGuidelines|guidelines for writing a thesis]] |
Line 143: | Line 158: |
A list of available, ongoing and completed projects and theses can be found [[ListOfProjectAndThesisTopics|here]]. {{{#!html <!-- # |
|
Line 148: | Line 167: |
{{attachment:ongoing.png|Ongoing|align="top"}} [[BachelorAndMasterProjectsAndTheses/GtfsBrowser|GTFS Browser Web App (Bachelor project or thesis)]]: Develop a web-application that can be used to analyze huge GTFS datasets. There are already some tools available (for example, [[https://github.com/google/transitfeed/wiki/ScheduleViewer|ScheduleViewer]]) but they all feel and look quite clumsy, are incredible slow and cannot handle large datasets. Supervised by [[https://ad.informatik.uni-freiburg.de/staff/brosi|Patrick Brosi]]. |
|
Line 184: | Line 201: |
{{attachment:ongoing.png|Ongoing|align="top"}} [[http://ad-wiki.informatik.uni-freiburg.de/teaching/BachelorAndMasterProjectsAndTheses/CircularArcTransitMaps|Circular Arc Transit Maps]] The goal of this project is to reproduce the results of [[http://www1.informatik.uni-wuerzburg.de/fileadmin/10030100/_temp_/circularArcMetro_01.pdf|this poster presentation]], but with our tool [[http://loom.cs.uni-freiburg.de|LOOM]]. Supervised by [[https://ad.informatik.uni-freiburg.de/staff/brosi|Patrick Brosi]]. {{attachment:available.png|Available|align="top"}} [[http://ad-wiki.informatik.uni-freiburg.de/teaching/BachelorAndMasterProjectsAndTheses/RiverMap|River Maps]] The goal of this project is to use our tool [[http://loom.cs.uni-freiburg.de|LOOM]] to render maps of rivers from OSM data. Each river segment should consist of all rivers that contributed to this river so far (for example, beginning at Mannheim, the Neckar should be part of the segment that makes up the Rhine). Think of a single river as a single subway line starting at the source of that river, and the Rhine, for example, as dozens of small subway lines next to each other. Supervised by [[https://ad.informatik.uni-freiburg.de/staff/brosi|Patrick Brosi]]. |
{{attachment:ongoing.png|Ongoing|align="top"}} [[http://ad-wiki.informatik.uni-freiburg.de/teaching/BachelorAndMasterProjectsAndTheses/RiverMap|River Maps]] The goal of this project is to use our tool [[http://loom.cs.uni-freiburg.de|LOOM]] to render maps of rivers from OSM data. Each river segment should consist of all rivers that contributed to this river so far (for example, beginning at Mannheim, the Neckar should be part of the segment that makes up the Rhine). Think of a single river as a single subway line starting at the source of that river, and the Rhine, for example, as dozens of small subway lines next to each other. Supervised by [[https://ad.informatik.uni-freiburg.de/staff/brosi|Patrick Brosi]]. |
Line 190: | Line 205: |
{{attachment:available.png|Available|align="top"}}[[https://ad-wiki.informatik.uni-freiburg.de/teaching/BachelorAndMasterProjectsAndTheses/WordExtraction|Accurate Word Extraction from Text Documents with Complex Layouts]] Design and implement a (learning-based) system for extracting words from layout-based text documents (e.g., PDF documents), which is a surprisingly difficult (but not super-hard) task. The reason is that the text is typically only provided character-wise (and not word-wise) and word boundaries must be derived from e.g., the spacings between the characters. Another challenge is that the layout of a text document can be arbitrarily complex, with text arranged in multiple columns and different alignments. Special care must be paid to not mix up text from different columns. Supervised by [[https://ad.informatik.uni-freiburg.de/staff/korzen|Claudius Korzen]]. {{attachment:available.png|Available|align="top"}}[[https://ad-wiki.informatik.uni-freiburg.de/teaching/BachelorAndMasterProjectsAndTheses/SpecialCharactersExtraction| Accurate Extraction of Special Characters from Layout-Based Text Documents]] Design and implement a (learning-based) system for extracting ''ligatures'' (like fi or ffi) and ''characters with diacritics'' (like á and è) from layout-based text documents (e.g., PDF documents). The challenge here is that such characters can be ''drawn'' into the text, in which case they need to be recognized by analyzing their shapes. Supervised by [[https://ad.informatik.uni-freiburg.de/staff/korzen|Claudius Korzen]]. |
{{attachment:ongoing.png|Ongoing|align="top"}}[[https://ad-wiki.informatik.uni-freiburg.de/teaching/BachelorAndMasterProjectsAndTheses/WordExtraction|Extracting Words from Text Documents with Complex Layouts (bachelor thesis)]] Design and implement a (learning-based) system for extracting words from layout-based text documents (e.g., PDF documents), which is a surprisingly difficult (but not super-hard) task. The reason is that the text is typically only provided character-wise (and not word-wise) so that word boundaries must be derived from e.g., analyzing the spacings between the characters. Another challenge is that the layout of a text document can be arbitrarily complex, with text arranged in multiple columns and different alignments so that special care is required to not mix up text from different columns. Supervised by [[https://ad.informatik.uni-freiburg.de/staff/korzen|Claudius Korzen]]. {{attachment:ongoing.png|Ongoing|align="top"}}[[https://ad-wiki.informatik.uni-freiburg.de/teaching/BachelorAndMasterProjectsAndTheses/SpecialCharactersExtraction|Extracting Special Characters from Layout-Based Text Documents (bachelor thesis)]] Design and implement a (learning-based) system for extracting ''ligatures'' (like fi or ffi) and ''characters with diacritics'' (like á and è) from layout-based text documents (e.g., PDF documents). The challenge here is that such characters can be ''drawn'' into the text, in which case they need to be recognized by analyzing their shapes. Supervised by [[https://ad.informatik.uni-freiburg.de/staff/korzen|Claudius Korzen]]. {{attachment:ongoing.png|Ongoing|align="top"}}[[https://ad-wiki.informatik.uni-freiburg.de/teaching/BachelorAndMasterProjectsAndTheses/MergingHyphenatedWords|Merging Hyphenated Words in Layout-Based Text Documents (project)]] Design and implement a (learning-based) system for merging hyphenated words in layout-based text documents (e.g., PDF documents). The challenge here is to decide, whether or not the hyphen between the two parts of a hyphenated word needs to be retained (because of a compound word) after merging the parts. Supervised by [[https://ad.informatik.uni-freiburg.de/staff/korzen|Claudius Korzen]]. {{attachment:available.png|Available|align="top"}} [[BachelorAndMasterProjectsAndTheses/GtfsBrowser|GTFS Browser Web App (Bachelor project or thesis)]]: Develop a web-application that can be used to analyze huge GTFS datasets. There are already some tools available (for example, [[https://github.com/google/transitfeed/wiki/ScheduleViewer|ScheduleViewer]]) but they all feel and look quite clumsy, are incredible slow and cannot handle large datasets. Supervised by [[https://ad.informatik.uni-freiburg.de/staff/brosi|Patrick Brosi]]. --> }}} |
This page describes how Bachelor's and Master's Projects and Theses work at the Chair for Algorithms and Data Structures.
A list of available, ongoing and completed projects and theses can be found here.
Contents
Application for a project or thesis
If you are interested in doing a project or theses with us, please carefully read this page first and then send an e-mail to the prospective supervisor with the following information:
- An acknowledgement that you have carefully read this whole page and the pages linked from it.
- A list of the courses you have already taken with us.
- A transcript of the grades of the courses you have taken so far.
- A very short description of your interests and your strengths (concerning work on a project/thesis).
- If you have an own project in mind (not necessary): a very short description of the goal and the scientific merit.
If you want to do your thesis with a company or another department, please provide the following *additional* information in a concise format. Concise means that you should not write more than a paragraph for each of the items below and that your text should be concrete and understandable for a non-expert. The purpose of this information is so that we can check whether the planned work has sufficient scientific merit with respect to the field of computer science.
- What is the expected / aimed at outcome?
How does your approach and the expected result differ from the state of the art? See the section on "Related Work" in our guidelines.
How do you plan to evaluate your work. See the sections on "Theoretical Analysis" and "Empirical Analysis" in our guidelines.
- Supervision must be provided by the company or the other department and the supervisor should provide an evaluation report at the end of the thesis, in a format to be discussed with us.
Deliverables of a Bachelor's or Master's PROJECT
1. Fulfill the reproducibility requirements described in the next section
2. Create a project website:
2.1 The pages should be in a subfolder www in your folder in our SVN
2.2 There should be a single MD file in blog post format in www/content/post
2.3 Any images or other files included in the MD file should be in www/static
2.4 Note that this does not exclude interactive elements (e.g. via JavaScript)
2.5 You can test the final appearance of the website as described here
2.6 You can look at websites of previous projects in this format
2.7 Here is a long list of previous projects in a format predating the MD format
3. There is usually no presentation needed for a project
Deliverables of a Bachelor's or Master's THESIS
1. Fulfill the reproducibility requirements described in the next section
2. A written thesis
2.1 Upload a PDF of the thesis to our SVN, in a separate subfolder thesis
2.2 In that same subfolder, also provide all the sources (tex files, bib files, figures, etc.)
2.3 Check our Guidelines for how to write a proper thesis
2.4 Here is a long list of example theses which have already been completed at our chair
3. An oral presentation
3.1 The oral presentation takes place after you have officially submitted your thesis
3.2 Upload a PDF of the slides to our SVN, in a separate subfolder presentation
3.3 In that same subfolder, also provide all the sources (if you use tex: like for the thesis, or the PPTX file)
3.4 The maximal duration of the presentation is 20 minutes
3.5 There is ample time for questions afterwards
3.6 The presentations take place in Building 51, 2nd Floor, Room 024 (our "Küche")
3.7 You should be there 15 minutes earlier for setup and testing
4. There is no need for a website for a thesis
Reproducibility
For both projects and theses, your results must be easily reproducible. We have very specific requirements for this. They are explained in detail on our Reproducibility page, together with a simple working example.
Note the word "easily" in the previous paragraph. It is important that we can reproduce your results not just in principle, but easily. That is, there should be no need for lengthy explanations or expert knowledge, but once the software runs, everything should be "self-explanatory". From our perspective this should look like this:
1. We build the docker image (the comments at the end of the Dockerfile should say how)
2. We start a docker container (the comments at the end of the Dockerfile should say how)
3. There is information on the console on how to proceed further
4. Whatever components you provide from here on, they should be self-explanatory
Concerning Item 3, this will look different depending on your type of work. If your docker container starts a web service, there should at least be output on the console which provides the URL of the service. Further explanations on the service can then be provided on the web page. If your docker container starts an interactive shell, there should at least be instructions on how to proceed further. In the simplest case, this can be a message Type make all to get help on the available options of how to proceed further.
Always keep in mind what is the point of all this. The point is that someone else, who is maybe not familiar with all or any of the details of your work, can see and reproduce what you have done easily, without assistance from you. And not just now, but also in six months or in a year or in five years. Also note that that "someone else" might be you in the future. It is amazing how quickly one forgets about ones own work, and you will be pleasantly surprised if you find that you can still run and understand everything months or years later.
The first meeting
In the first meeting with the supervisor, create a Google Doc where you copy and fill out the following template. The Google Doc must be named Firstname Lastname (<type of work>), where type of work is one of: Bachelorprojekt, Masterprojekt, Bachelorarbeit, Masterarbeit, Bachelorprojekt + Bachelorarbeit, Masterprojekt + Masterarbeit. The Google Doc should be shared with at least the following people: Student, Supervisor(s), Hannah Bast ( bast.hannah@gmail.com ), Frank Dal-Ri ( nirlad@gmail.com , our system administrator), Heike Hägle ( hhaegle@gmail.com , our secretary).
The document should contain a section for each meeting. As a minimum, each section header should contain the number of the meeting and the date and the time. The sections should be ordered in reverse chronologial order, that is, with the most recent meeting at the TOP.
The section of the first meeting should contain at least the following information:
Short working title: [this may change later] Uni-Account: [initials + number] Informatik-Account: [usually first seven letters of last name + the initial of first name] Primary e-mail adress: SVN: [subdirectory in student-projects or student-theses, named firstname-lastname] Special RAM requirements: Special Disk space requirements: Actual beginning of work: Planned end of work: Goal of the thesis: [succinct description in one paragraph] First step: [see text below] Deadline for the first step:
The first step should be something, where the whole problem is solved from beginning to end, but with a relatively simple approach (how simple is up to you). The more aspects are touched (even if just in a simplistic way) in this first step, the better. The resulting code should follow our reproducibility guidelines, just like for the final submission.
This first step is usually considerable work, but not very hard technically. It will give you a very good feeling for the challenges involved. Having completed this first steps, it usually becomes very clear (from the shortcomings of the simple approach) what the next steps should be.
We urge you to start your work right after the first meeting and we will be very unhappy if you don't. If you are not quite finished by the deadline, just drop us a line and ask for an extension. But never come to a follow-up meeting unprepared or with half-finished code, see the next section.
Follow-up meetings
It is very important that you come well-prepared to all follow-up meetings. In particular:
You must have working code and data ready that follows our reproducibility guidelines, just like for the final submission. In particular, all the relevant data should be there, either under /local or under /nfs/students. We should have the opportunity to try out your code before the meeting, so please send it early enough.
Time-consuming precomputation should be done before the meeting. Many projects or theses involve some sort of preprocessing of (often large amounts of) data. For the intermediate meetings, we usually don't want to reproduce your precomputation, but we want to be able to reproduce whatever it is that can be done with the results of your precomputation. Store the results of your precomputation in your folder under /nfs/students or (if network latency is an issue) under /local/data. The docker container should then mount this data via the -v option.
Don't expect us to lead the meeting, it is your project / thesis and you should be the driving force. If you have specific problems or questions, you should prepare something (ideally in the form of code or a demo or an example), so that we can quickly understand what the problem is. It's usually not efficient if you start by telling us about all the details of the current status quo.
Always bring your laptop, in case there is uncommitted code or data. It will also allow you to make small fixes right in the meeting.
Forum for asking questions
We have a dedicated forum for asking questions related to the technical aspects of our project or thesis: https://daphne.informatik.uni-freiburg.de/forum/viewforum.php?f=1082
It's similar to the forums we have for our courses. The forum is organized into several subforums, including: Programming, Docker, Datasets, Writing and Presenting, Miscellaneous.
All members from the Algorithms and Data Structures group are subscribed and will be happy to help whenever they can. Please write us an email if you have asked a question on the forum and haven't received an answer for several days. Other students (including you) can also subscribe to the forum and they are very welcome to answer questions, too. Like in our courses, many questions are relevant and interesting for other students, too.
In principle, feel free to ask any question on the forum. But for questions strongly related to your project and which probably only your supervisor can answer, better ask your supervisor directly.
Grading scheme for the theses
Your final grade for the thesis will be an average of four grades, one for each of the following four aspects:
- Quality of the conceptual/theoretical work. This includes aspects such as:
- How well were the ideas thought out / the details worked out
- How independent was the work
- How meaningful and interesting/useful were the results.
- Quality of the implementation work. This includes aspects such as:
- Does the Docker setup work (this is actually a hard requirement)
- How easily can we reproduce your work and your results
- Is the code well documented, did you adhere to a proper coding style, are there unit tests (these are all basic requirements in every course offered by our chair).
- Quality of the evaluation. This includes aspects such as:
- Is the experimental setup well described
- Is the selection of datasets reasonable (there should be at least two, with different characteristics)
- Is there a comparison with a reasonable baseline or competitor method
- Are the results correct
- Are the results properly discussed
Quality of the write-up. This includes the aspects described in our guidelines for writing a thesis
List of available and ongoing projects and theses
A list of available, ongoing and completed projects and theses can be found here.