AD Teaching Wiki:

This page contains guidelines for writing a Master's or Bachelor's Thesis at the Chair of Algorithms and Data Structures. Here is long list of example theses which have already been completed at our chair. Note that these theses represent the whole spectrum of grades, from 1,0 (very good) to 4,0 (sufficient). If you are looking for a thesis that is exemplary in a certain respect, ask your supervisor (we cannot disclose this information on a public website).

Five Golden Rules

In this section, we discuss five golden rules: "simple sentences", "well-defined terms", "consistency", "be concrete", and "clear context". If you follow all five, your paper will be easy to read and understand. If you neglect only one of these rules, reading your paper will be no fun and it will be hard or even impossible to understand what you wanted to say.

The subsections below give a short description for each of these rules. On this page, you find some example snippets from random publications. For each snippet, there will be a short discussion of how understandable it is and if it is not well understandable, which of the five rules has been neglected.

Golden Rule #1: Simple sentences

A very simple but important rule of thumb is: write simple sentences. A simple sentence has the form: subject, predicate, object. If your sentence has two or more verbs, consider splitting it into two simpler sentences. It is rarely necessary to have sentences with more than two verbs. The sentences in this document are good examples for this writing style.

As a counterexample, this sentence, which is rather convoluted due to various reasons, including, for example, these nested relative clauses, puts a rather heavy cognitive load on the reader and is also simply too long by way of making several statements all in one sentence which could have just as well been made in, you can guess it, multiple sentences, one sentence per statement.

You get the idea! Note that using simple sentences may not be the best style when writing a novel. But it is perfect for scientific text, because it is easy to read and easy to understand.

Golden Rule #2: Well-defined terms

Whenever you use a term that is not commonly understood, you have to define it first. You can assume that your reader has a basic education in computer science, but not more.

For example, you don't have to define what an "algorithm" is. But when you say something about the "shortest-path problem", you should first define what the shortest-path problem is. Even if your readers have heard about the shortest-path problem in general, it's important to clarify which particular variant you are talking about.

On rare occasions, it might make sense to explain a term in the sentence right after the one where you introduce it. If you do that, it should really be the very next sentence. And this construct should really be the exception, and not the rule. If you use a term many pages after its definition and after it has last been used, it's usually a good idea to remind the reader of the definition. For example, you could refer to the definition or remind the reader what the term means or both.

Golden Rule #3: Consistency

If you use a particular word to describe something, stick to that word. Variation is maybe nice in a novel. But in scientific work, every single inconsistency makes it harder for the reader. It is also fine to repeat the same word again in the next sentence. In scientific work, this is even good style.

For example, it is perfectly OK to write: "In this section, we describe the details of our algorithm. The algorithm proceeds in two phases". You might be tempted to write the second sentence as "It proceeds in two phases". This would actually make the text harder to read because a reader has to resolve what "It" refers to (it could refer to "our algorithm", but also to "this section"). It would be even worse to write "The procedure proceeds in two phases". Now you have introduced a new word and the reader will be confused about whether "The procedure" is referring to "our algorithm" or maybe something else which they have overlooked.

Visual consistency is also important. If you use different fonts, use them with a consistent meaning. If you have multiple tables or figures in your thesis, make sure they have a consistent look.

Golden Rule #4: Be Concrete

Be concrete, as opposed to vague. It is very easy to be vague with natural language. This is especially true if you have not fully understood something or if you don't know the details, but you want to make a statement nevertheless.

An example of a vague sentence is: "We show that our algorithm is much better than previous ones". First, the quality measure is not clear: better in exactly which way? Second, it is not clear how much better: 10% better, twice better, ten times better? Third, which previous algorithms: all of them them, some of them, and if only some of them, which ones?

Concreteness is particular important in the parts, when you define your problem or describe your solution or your experimental setup. For example, consider: "The goal of this paper is to get text from PDF documents". This sentence is unconcrete on so many levels. What does "get" mean, in which format should be text be "gotten", what constitutes a good solution, etc.

Golden Rule #5: Clear context

The first four rules are easy to understand and check. This fifth rule is a bit more subtle, but just as important as the previous four rules. Often a text is hard to read or understand and you can't really put your finger on why. That is usually because there are many sentences, where the context is unclear or missing.

So what does context refer to? When you explain something non-trivial, you need multiple sentences. Each sentence of your explanation will refer to entities or statements from earlier sentences. It is important that it is 100% clear what these references refer to. Otherwise the reader has to pause and think or even guess, both of which are a major nuisance when reading.

For example, consider the sentence "We bridge the gap between entities and text using automatic information extraction". This sentence is impossible to understand in isolation. It speaks of "the gap between entities and text". To understand it, one needs to understand: (1) what exactly "entities" refers to, (2) what exactly "text" refers to, and (3) what exactly is meant by the "gap" between these two. If the immediately preceding sentences make this clear, the sentence is fine. If they don't make it clear, the sentence remains cryptic.

Note that another issue with the example sentence might be the term "automatic information extraction". That term would be OK if it has been defined before or if it is defined right in the next sentence. If only "information extraction" was defined, the "automatic" would be an example of a vague term that is not concrete enough (Golden Rule #4).

Frequently Asked Questions

English or German?

You can write your thesis in German or in English. If you write it in English, more people will read it. Plus, it's a great opportunity to practice writing in English. You will almost certainly need that ability in your later job, so why not start now. If you don't want anyone to read your thesis, write it in French.

American English or British English?

American English please. For example, write analyze (AE) and not analyse (BE). Write neighbor (AE) and not neighbour (BE). Write labeling (AE) and not labelling (BE). As a rule of thumb: if two variants of a word come to your mind, use the ones with fewer letters and the one with a z instead of an s.

How much should I write?

A typical question is: how much should I write? My typical answer is: write as much as is necessary to understand what you did, how you were doing it, and all the aspects listed in the section Structure of a thesis below. No more, no less. The thesis should be self-contained. That is, for someone with a basic education in computer science, it should not be necessary to read anything else than your thesis in order to understand all the main aspects of your thesis.

Whatever you do, there is probably lots of previous work at least on the general topic of your project or theses, and maybe even on the particular problem you are dealing with. Screening all this work (and first finding it, which can also be non-trivial) can take an arbitrary amount of time and easily several months. This gives rise to the question, how much time you should spend on it and when.

The short answer is that you should avoid the two extremes. One extreme is to ignore previous work until you are done with the thesis and then do a few Google searches to hack together a related work section. The other extreme is to spend many months on finding and screening related work before even starting to work on the problem yourself.

A good compromise is as follows. Before you start with your work, spend a day or so on searching the web (in particular, Google Scholar) on the general topic of your project or thesis. If you don't find anything on the particular problem you are dealing with, widen your search. There certainly is something about the general topic. Often, other people use other terminology to describe very similar or even the same things. So be creative when searching.

One good strategy is to look for well-cited papers or surveys. Such papers serve three purposes. First, the paper itself will tell you something interesting about the topic and the state of the art. Second, the paper will provide references to other work that might be relevant. Third -- and this is relevant especially if the paper is already a bit older -- you can look for newer papers citing this paper (this is easy in, say, Google Scholar). Note that all three only work well for high-quality papers, hence the remark about the citation count. A low-quality paper might be unaware of other related work and it might be ignored by the community and is hence not cited by other related work. (However, also great papers are sometimes ignored by future papers on a related topic because the authors of those papers did not do their homework.)

However, don't wait too long before starting your own work. But don't forget to look for related work either. While you work on the problem yourself, you should keep looking for related work as a background process. The more you understand about the problem yourself, the easier it will be to find other related work. Keep note of these works. In fact, it's best to write a paragraph or two about each paper you encounter right when you encounter it. That way, you will already have all the material you need when you have to write your Related Work section in the end.

For how to actually write the Related Work section, see the corresponding subsection the section Structure of a thesis below.

"We" or "I" or passive voice?

There is no clear answer or recommendation for that one. Look at the many theses and publications on our web page and pick the style you like best. If you want to use "We", that is ok also if you are the only author. In many contexts, "we" can be interpreted as "the reader and I".

Spelling, Hyphens, Commas

As a final step, ALWAYS run a spell checker over your write-up. It is very embarrassing indeed if your write-up contains mistakes that any spell checker would have found.

Computer-science articles contain many multi-word noun phrases. A common question is when (not) to put a hyphen. There is actually a clear and simple rule for this: multi-word noun phrases have a hyphen only if you use them as an adjective. Here is an example: (1) This problem has a large scale. (2) This is a large-scale problem. Putting a hyphen in (1) would be a mistake. Not putting the hyphen in (2) would be a mistake. Understand the purpose of the hyphen. It says what belongs together. In sentence (2), without the hyphen, it would not be clear whether it is a "scale problem" that is large, or whether the problem has a "large scale".

Another common question is when (not) to put a comma. The rules in the English language are less strict than in the German language, but they are still pretty clear in most cases. For example, you should always put a comma after introductory clauses like "However", "In this section", "Therefore", etc. A non-restrictive relative clause should be separated by commas, for example: "Albert Einstein, who won a Nobel Prize, is well-known". Relative clauses that restrict a noun should not be separated by commas, for example: "Physicists who won Nobel Prizes are usually well-known". As a rule of thumb: when you have two statements in one sentence and the separation of the two statements is not 100% clear, put a comma to clarify this. That is, in fact, the main purpose of a comma.

Finally, commas can save lives. For example: Let's eat, Grandma.

Structure of a thesis

This section provides a list of the typical elements of a thesis, together with short descriptions. In facts, these are the typical elements of any scientific article or paper.

Every section and every subsection -- except the Abstract and the Introduction -- should start with a small introduction that tells the reader what comes next. The following four sentences are an example: "In this section, we will give a high-level description of the algorithm. The algorithm has two phases. In the first phase, ... (Section 3.1). In the second phase, ... (3.2)." Note how simple the sentences are (Golden Rule #1).

A good way to start your write-up is to write down all the section and subsection headers. The names of the headers should be carefully chosen, and they should be consistent in their style.

Title

As a minimum, a title should be informative. Think of it as the shortest way to summarize your work in one short sentence. A secondary criterion is that it is catchy. This is not necessary, however, and sometimes hard to achieve. Also a catchy title should be informative. If you build a system, a typical title is the name of the system, followed by a colon, followed by a short description of what the system does. Avoid titles that cannot be understood before reading any of the contents.

Abstract

An abstract should be self-contained and understandable to a non-expert. Think of it as the shortest way to summarize your work in one paragraph. As a minimum, it should clarify the problem dealt with in the thesis, and the main results that were obtained. If a short example can be given, it should be given, but this is not always possible. If space permits, add a sentence or two about the underlying techniques and how the thesis advances the state of the art.

Introduction

An introduction should be self-contained and understandable to a non-expert. Think of it as the shortest way to summarize your work in a couple of pages. As a minimum, it should clarify the problem dealt with, the motivation for dealing with that problem, the main results, the main challenge and the line of attack used to overcome it, and how it advances the state of the art. In the abstract, you have only one or two sentences (if any) for each of these aspects. In the introduction, you have more space. For the problem statement, it is almost always a good idea to provide a figure or screenshot with a (carefully chosen) example.

One common mistake (and nuisance) is to have relative vague and informal statements in the introduction and then later a separate section with a more formal problem statement. The corresponding reader experience is that upon first reading one either does not understand the introduction or does not find it very useful or both. You can fix this as follows: whatever is relatively easy to understand and can already be defined in the introduction should already be defined in the introduction.

Also see the FAQ "How much related work should I look at?" above.

Most probably, other researchers have worked on the problem of your thesis, or a strongly related problem, before. Your thesis should include a section which summarizes the most relevant of these works. Typically this is Section 2, right after the Introduction. For each of these works, it should be explained in a nutshell what they do, what they achieve, and how this differs from what you do in your thesis. This should be understandable without the need to actually read the papers referred to. For each related work, think of the description as the shortest way to say this in one paragraph.

When there is a lot of work about a particular topic / problem, it is ok to focus on the most recent / most advanced approaches. Where there is no work on the exact problem from your thesis, the Related Work section should be about work on similar problems and it should explain how these related problems differ. Note that sometimes a problem looks different only on the surface, and the solutions can actually be applied to your problem as well.

Theoretical analysis

Most probably, your work will make use of some algorithms and data structures. Either your own, or such from previous work, or a combination of the two. In any case, provide information about the basic complexity of your algorithms, in particular their running time. Do this also if the the statement appears straightforward to you. For example, the running time of one of your algorithms may obviously be linear in the size of the input data. In any case, say it and provide an argument / proof for it!

Empirical analysis

Most probably, your work involves the implementation of an algorithm or data structure, or of a whole system. Whatever it is, your implementation should be thoroughly evaluated. The kind of evaluation depends on the nature of your problem. If the focus is on results of a particular quality, that should be evaluated. If the focus is on efficiency, running time and (if relevant) space consumption should be evaluated. Even if the focus is on quality, efficiency should be evaluated, too. One always wants to know the running time of a procedure and (if relevant) its space consumption. If there is a pre-processing phase, this should be evaluated separately. If the pre-processing consumes a lot of intermediate disk space or memory, that should also be evaluated. Think about the evaluation from the perspective of someone who wants to use your software in practice. What is it that you would want to know then?

Typically, there are other approaches which can be used (either directly or with small modifications / adjustments) to solve your problem. As a minimum, compare to the best one of these approaches. If there is a variety of principally different approaches, pick the best one for each principle. If there is no solution yet for your problem, think of a simple baseline algorithm (= the straightforward solution) and compare to that. Sometimes there are two or even three simple baseline algorithms. Do your evaluation on at least three different data sets of different kinds and sizes. If the amount of work needed per data set is very large, it is OK to use only two data sets.

Future work

Most probably, your work will leave various open ends. Make a list of what could be done next to improve on what you did. For each item, give a short description of the possible improvement + an idea for how, in principle, it could be achieved. Also give an estimate for the necessary time to realize that improvement (hours, days, weeks, months). Order your list by importance / significance. That is, the thing that should be improved next / gives the biggest improvement should come first.

Bibliography

Make sure that the entries in your bibliography have a consistent style. That is, abbreviate all conferences in the same way or use the full names for all, but do not mix the two styles. Same for author names. Same for capitalization of titles. Same for page numbers. It makes a very careless and untidy impression if the bibliography entries are inconsistent in their style. Note that it would be a mistake to put a comma before the "if" in the previous sentence.

AD Teaching Wiki: WritingGuidelines (last edited 2023-05-30 16:54:22 by Hannah Bast)