Differences between revisions 313 and 433 (spanning 120 versions)

Welcome to the Wiki page of the course Search Engines, WS 2009 / 2010. Lecturer: Hannah Bast. Tutorials: Marjan Celikik. Course web page: click here.

Here are PDFs of the slides of the lectures so far: Lecture 1, Lecture 2, Lecture 3, Lecture 4, Lecture 5, Lecture 6, Lecture 7, Lecture 8.

Here are .lpd files of the recordings of the lectures so far (except Lecture 2, where we had problems with the microphone): Recording Lecture 1, Recording Lecture 3, Recording Lecture 4, Recording Lecture 5 (no audio), Recording Lecture 6 (with audio for a change), Recording Lecture 7 (AVI), Recording Lecture 8 (AVI).

Here are PDFs of the exercise sheets so far: Exercise Sheet 1, Exercise Sheet 2, Exercise Sheet 3, Exercise Sheet 4, Exercise Sheet 5, Exercise Sheet 6, Exercise Sheet 7, Exercise Sheet 8.

Here are your solutions and comments on the previous exercise sheets: Solutions and Comments 1, Solutions and Comments 2, Solutions and Comments 3, Solutions and Comments 4, Solutions and Comments 5, Solutions and Comments 6, Solutions and Comments 7.

Exercise Sheet 8

The recordings of all lectures are now available, see above. Lecture 2 is missing because we had technical problems there. To play the Lecturnity recordings (.lpd files) you need the Lecturnity Player, which you can download here. I put the Camtasia recordings as .avi files, which you can play with any ordinary video player; I would recommend VLC.

Here are the rules for the exercises as explained in Lecture 2.

Here you can upload your solutions for Exercise Sheet 8.

Questions or comments below this line, most recent on top please

Most humbly I ask further hints for exercise 3. Johannes 14Dec09 19:53

Supplemental to Manuela + all: According to an (old) study, most of the words do not contain duplicate k-grams. Marjan 14Dec09 19:35

Hi Manuela + all: thanks for pointing out this problem. No, you don't have to consider this. You may either assume that the k-grams of each word are distinct, or consider A and B as multi-sets, that is, in case a k-gram occurs x > 1 times in a word it is counted x times in the set of k-grams for that word as well. In either case, the size of the set of k-grams of a word x is exactly |x| - k + 1. If that does not fully answer your question, please ask again. Hannah 14Dec09 19:20

I've got a problem with my formula for exercise 2. If I have the words x = bord and y = booo and I want to get the two-grams A = {bo,or,rd} and B = {bo,oo}, then in B there will be a two-gram "oo" lost, because of the set. My formula doesn't realize that and I don't know if I could fix it. Must we consider this problem? Manuela 14Dec09 18:58

To all again: if you want to get notification when someone added a comment to this page, just click on the Info / Subscribe link towards the top right. Then this Wiki page effectively becomes a mailing list for you. If you make only a trivial change to the page (like correcting a typo), then tick the box "Trivial change" before saving, then people will not be sent a notification for (almost) nothing. Hannah 14Dec09 14:49

To all: Exercise 3 is certainly the hardest on this sheet. Although again nobody has asked for a hint so far, here is one: You have to prove that ED(x, y) = EDm(x, y), where ED is the normal edit distance, and EDm is the edit distance where the sequence of transformations must be monotone, as defined in the lecture and on the sheet. ED(x, y) <= EDm(x, y) is trivial since every monotone transformation sequence is also a normal transformation sequence. To prove EDm(x, y) <= ED(x, y), the natural way is again to use induction over |x| + |y|, as done in the lecture for my proof of Lemma 2 (which didn't quite work out, but for other reasons). If you need more hints, ask. Hannah 14Dec09 14:46

Hi Björn, I don't understand why/how you would need that a transformation "knows" its position. I also don't see that there is any big formalism involved in what a transformation is. A transformation is one of insert / delete / replace, and it occurs at a particular position of the current string. That is a transformation, no more no less. Hannah 14Dec09 14:42

One question on exercise 3: Do we have to assume that each transformation is of a certain form (i.e. insert at i'th character of current word, where it matters how many deletes/inserts have been done previously)? Or can we assume that a transformation somehow "knows" its position? If we have to deal with it in an abstract way, should we create our own way of describing a transformation or is there a formalism we all should use. Björn 13:31

Here is another bonus point system: http://www.informatik.uni-freiburg.de:8081/swt/teaching/winter-term-2009/informatik-iii-theoretische-informatik. Additionally I would like to add that most of these bonus point systems have exercises which are far less work. Björn

In some other lectures the points from the sheets are used to increase the grade from the exam. If one got 50% then she/he got 0.3 grade better, 80% for 0.6 better, but min. 4.0 in exam. I.e. 83% of the exercise points and 2.3 in exam = 1.7 final grade. Data Mining & Machine Learning used a more complex scheme, there was theoretical sheets and practical sheets using Weka (the slides should be on electures server). Markus 14Dec09 9:23

Many exercise sheets contained two tasks and for each task we could get 1 point. That means that we could got 2 or even 3 points per sheet. The exercises were optional, but because of the bonus points most students did the exercises. There was another reason to do the exercises. In this lecture the exam was very similar to the exercise sheet tasks, so we had two considerable advantages. In two other lectures we had to achieve 30 and 50% of the exercise points to participate at the exam. Manuela 14Dec09 00:57

Now that I think about it, there were points for solving exercises in exercise-class. Also, there were more than 15 points achievable over the semester, but max. 15 were credited in the exam. alex 13Dec09 23:57

In that lecture we could reach max. 15 bonus points for the exercises. Each exercise sheet scored one point + one extra point for specially good/clever solutions. I don't think there was a constraint on admission to the exam. There might have been points for solving exercises in the exercise-class, but I'm not sure about that. (--> already one year ago) alex 13Dec09 23:48

Thanks for the link, Alex, can you please explain the thing with the "bonus point of the exercise"? Is it that there were 15 bonus points to reach in the exercises? How many tasks in the exercises got bonus points and how many didn't? And the non-bonus tasks than only counted for admission to the exam? Hannah 13Dec09 23:36

I quite liked this scheme, although it does not put that much weight on the exercises: http://cone.informatik.uni-freiburg.de/teaching/vorlesung/algorithmentheorie-w08/exam.html alex 13Dec09 23:17

Can someone please post the details of one of the existing bonus systems here? (Something like: for xyz % of the exercises, you can improve your exam mark by abc.) Or the links to a course site where these details are given. Thanks! Hannah 13Dec09 23:09

To Mirko + all: Yes, it's d, I don't know how I've missed this one. Marjan 12Dec09 13:16

About Exercise 1: in the second part, is it really delta? shouldn't it be d? Mirko 12Dec09, 12:43

-  ⇤ ← Revision 313 as of 2009-11-21 20:29:52 → 
  Size: 4500
  Editor: Hannah Bast
  Comment:
+   ← Revision 433 as of 2009-12-14 19:52:57 → ⇥
  Size: 9474
  Editor: p549FA314
  Comment:
-Deletions are marked like this.
+Additions are marked like this.
 Line 3:
-Here are PDFs of the slides of the lectures so far: [[attachment:SearchEnginesWS0910/lecture-1.pdf|Lecture 1]], [[attachment:SearchEnginesWS0910/lecture-2.pdf|Lecture 2]], [[attachment:SearchEnginesWS0910/lecture-3.pdf|Lecture 3]], [[attachment:SearchEnginesWS0910/lecture-4.pdf|Lecture 4]], [[attachment:SearchEnginesWS0910/lecture-5.pdf|Lecture 5]].
+Here are PDFs of the slides of the lectures so far: [[attachment:SearchEnginesWS0910/lecture-1.pdf|Lecture 1]], [[attachment:SearchEnginesWS0910/lecture-2.pdf|Lecture 2]], [[attachment:SearchEnginesWS0910/lecture-3.pdf|Lecture 3]], [[attachment:SearchEnginesWS0910/lecture-4.pdf|Lecture 4]], [[attachment:SearchEnginesWS0910/lecture-5.pdf|Lecture 5]], [[attachment:SearchEnginesWS0910/lecture-6.pdf|Lecture 6]], [[attachment:SearchEnginesWS0910/lecture-7.pdf|Lecture 7]], [[attachment:SearchEnginesWS0910/lecture-8.pdf|Lecture 8]].
 Line 5:
-Here are .lpd files of the recordings of the lectures so far (except Lecture 2, where we had problems with the microphone): [[http://vulcano.informatik.uni-freiburg.de/lecturnity/lecture-1.lpd|Recording Lecture 1]], [[http://vulcano.informatik.uni-freiburg.de/lecturnity/lecture-3.lpd|Recording Lecture 3]], [[http://vulcano.informatik.uni-freiburg.de/lecturnity/lecture-4.lpd|Recording Lecture 4]], [[http://vulcano.informatik.uni-freiburg.de/lecturnity/lecture-5.lpd|Recording Lecture 5 (no audio)]].
+Here are .lpd files of the recordings of the lectures so far (except Lecture 2, where we had problems with the microphone): [[http://vulcano.informatik.uni-freiburg.de/lecturnity/lecture-1.lpd|Recording Lecture 1]], [[http://vulcano.informatik.uni-freiburg.de/lecturnity/lecture-3.lpd|Recording Lecture 3]], [[http://vulcano.informatik.uni-freiburg.de/lecturnity/lecture-4.lpd|Recording Lecture 4]], [[http://vulcano.informatik.uni-freiburg.de/lecturnity/lecture-5.lpd|Recording Lecture 5 (no audio)]], [[http://vulcano.informatik.uni-freiburg.de/lecturnity/lecture-6.lpd|Recording Lecture 6 (with audio for a change)]], [[http://vulcano.informatik.uni-freiburg.de/lecturnity/lecture-7.avi|Recording Lecture 7 (AVI)]], [[http://vulcano.informatik.uni-freiburg.de/lecturnity/lecture-8.avi|Recording Lecture 8 (AVI)]].
 Line 7:
-Here are PDFs of the exercise sheets so far: [[attachment:SearchEnginesWS0910/exercise-1.pdf|Exercise Sheet 1]], [[attachment:SearchEnginesWS0910/exercise-2.pdf|Exercise Sheet 2]], [[attachment:SearchEnginesWS0910/exercise-3.pdf|Exercise Sheet 3]], [[attachment:SearchEnginesWS0910/exercise-4.pdf|Exercise Sheet 4]], [[attachment:SearchEnginesWS0910/exercise-5.pdf|Exercise Sheet 5]].
+Here are PDFs of the exercise sheets so far: [[attachment:SearchEnginesWS0910/exercise-1.pdf|Exercise Sheet 1]], [[attachment:SearchEnginesWS0910/exercise-2.pdf|Exercise Sheet 2]], [[attachment:SearchEnginesWS0910/exercise-3.pdf|Exercise Sheet 3]], [[attachment:SearchEnginesWS0910/exercise-4.pdf|Exercise Sheet 4]], [[attachment:SearchEnginesWS0910/exercise-5.pdf|Exercise Sheet 5]], [[attachment:SearchEnginesWS0910/exercise-6.pdf|Exercise Sheet 6]], [[attachment:SearchEnginesWS0910/exercise-7.pdf|Exercise Sheet 7]], [[attachment:SearchEnginesWS0910/exercise-8.pdf|Exercise Sheet 8]].
 Line 9:
-Here are your solutions and comments on the previous exercise sheets: [[SearchEnginesWS0910/ExerciseSheet1|Solutions and Comments 1]], [[SearchEnginesWS0910/ExerciseSheet2|Solutions and Comments 2]], [[SearchEnginesWS0910/ExerciseSheet3|Solutions and Comments 3]], [[SearchEnginesWS0910/ExerciseSheet4|Solutions and Comments 4]].
+Here are your solutions and comments on the previous exercise sheets: [[SearchEnginesWS0910/ExerciseSheet1|Solutions and Comments 1]], [[SearchEnginesWS0910/ExerciseSheet2|Solutions and Comments 2]], [[SearchEnginesWS0910/ExerciseSheet3|Solutions and Comments 3]], [[SearchEnginesWS0910/ExerciseSheet4|Solutions and Comments 4]], [[SearchEnginesWS0910/ExerciseSheet5|Solutions and Comments 5]], [[SearchEnginesWS0910/ExerciseSheet6|Solutions and Comments 6]], [[SearchEnginesWS0910/ExerciseSheet7|Solutions and Comments 7]].
 Line 11:
-= Exercise Sheet 5 =
+= Exercise Sheet 8 =
 Line 13:
-The recordings of all lectures are now available, see above. Lecture 2 is missing because we had technical problems there. To play the recordings (it's .lpd files) you need the Lecturnity Player. [[http://www.lecturnity.de/de/download/lecturnity-player|You can download the player for free here]].
+The recordings of all lectures are now available, see above. Lecture 2 is missing because we had technical problems there. To play the Lecturnity recordings (.lpd files) you need the [[http://www.lecturnity.de/de/download/lecturnity-player|Lecturnity Player, which you can download here]]. I put the Camtasia recordings as .avi files, which you can play with any ordinary video player; I would recommend [[http://www.videolan.org/vlc|VLC]].
 Line 17:
-[[SearchEnginesWS0910/ExerciseSheet5|Here you can upload your solutions for Exercise Sheet 5]].
+[[SearchEnginesWS0910/ExerciseSheet8|Here you can upload your solutions for Exercise Sheet 8]].
 Line 21:
-By the way, whenever I talk about "lists" here or on the exercise sheets or in the lecture, I am not referring to a particular data structure (in particular I am NOT talking about a linked list), but "list of elements"  is just "series of elements". And well, "inverted list" is just common terminology. To implement a "list of doc ids" or anything like that you should of course always use an array or a vector or a data structure like that. '''Hannah 21Nov09 8:30pm'''
+Most humbly I ask further hints for exercise 3. '''Johannes 14Dec09 19:53'''
 Line 23:
-Hi Marius + all, let me explain it by an example. Your two input lists occupy a certain amount of memory. Every programming language has built-in functions for this. For example, if your list entries are ints, then for C++ you can use sizeof(int) to get the number of bytes occupied by one entry.  Multiply by the number of list elements to get the number of bytes occupied by one list. One Megabyte (MB) is 1024 * 1024 bytes. Now assume your two lists together occupy 10 MB. Assume your code takes 0.1 seconds to intersect these two lists. Then the "MB processed per second" is 100 MB / second. '''Hannah 21Nov09 8:26pm'''
+Supplemental to Manuela + all: According to an (old) study, most of the words do not contain duplicate k-grams. '''Marjan 14Dec09 19:35'''
 Line 25:
-Hi, in exercise 3, what do you mean by "MB processed per second"? Is a MB the equivalent to 4096 processed integers? And when is a MB to be considered as processed? When it's written to the intersected list or in the comparisons, already? '''Marius 21Nov09 7:33pm'''
+Hi Manuela + all: thanks for pointing out this problem. No, you don't have to consider this. You may either assume that the k-grams of each word are distinct, or consider A and B as multi-sets, that is, in case a k-gram occurs x > 1 times in a word it is counted x times in the set of k-grams for that word as well. In either case, the size of the set of k-grams of a word x is exactly |x| - k + 1. If that does not fully answer your question, please ask again. '''Hannah 14Dec09 19:20'''
 Line 27:
-The slides + all my hand-writing on it are now online, see the link ''Recording Lecture 5 (no audio)'' above. '''Hannah 20Nov09 3:24am'''
+I've got a problem with my formula for exercise 2. If I have the words x = bord and y = booo and I want to get the two-grams A = {bo,or,rd} and B = {bo,oo}, then in B there will be a two-gram "oo" lost, because of the set. My formula doesn't realize that and I don't know if I could fix it. Must we consider this problem?  '''Manuela 14Dec09 18:58'''
 Line 29:
-The recording of todays lecture again did not work. I am very sorry for that (and very angry that there are so many problems with this software). Anyway, the end result of the lecture, that is the slides with all the writing on it are available and I will put them online as soon as possible. '''Hannah 19Nov09 11:23pm'''
+To all again: if you want to get notification when someone added a comment to this page, just click on the Info / Subscribe link towards the top right. Then this Wiki page effectively becomes a mailing list for you. If you make only a trivial change to the page (like correcting a typo), then tick the box "Trivial change" before saving, then people will not be sent a notification for (almost) nothing. '''Hannah 14Dec09 14:49'''
 Line 31:
-There is a typo in Exercise 5 of the new sheet. The two occurrences of ''n'' should be ''m''. '''Hannah 19Nov09 11:22pm'''
+To all: Exercise 3 is certainly the hardest on this sheet. Although again nobody has asked for a hint so far, here is one: You have to prove that ED(x, y) = EDm(x, y), where ED is the normal edit distance, and EDm is the edit distance where the sequence of transformations must be monotone, as defined in the lecture and on the sheet. ED(x, y) <= EDm(x, y) is trivial since every monotone transformation sequence is also a normal transformation sequence. To prove EDm(x, y) <= ED(x, y), the natural way is again to use induction over |x| + |y|, as done in the lecture for my proof of Lemma 2 (which didn't quite work out, but for other reasons). If you need more hints, ask. '''Hannah 14Dec09 14:46'''

Hi Björn, I don't understand why/how you would need that a transformation "knows" its position. I also don't see that there is any big formalism involved in what a transformation is. A transformation is one of insert / delete / replace, and it occurs at a particular position of the current string. That is a transformation, no more no less. '''Hannah 14Dec09 14:42'''

One question on exercise 3: Do we have to assume that each transformation is of a certain form (i.e. insert at i'th character of current word, where it matters how many deletes/inserts have been done previously)? Or can we assume that a transformation somehow "knows" its position? If we have to deal with it in an abstract way, should we create our own way of describing a transformation or is there a formalism we all should use. '''Björn 13:31'''

Here is another bonus point system: http://www.informatik.uni-freiburg.de:8081/swt/teaching/winter-term-2009/informatik-iii-theoretische-informatik. Additionally I would like to add that most of these bonus point systems have exercises which are far less work. '''Björn'''

In some other lectures the points from the sheets are used to increase the grade from the exam. If one got 50% then she/he got 0.3 grade better, 80% for 0.6 better, but min. 4.0 in exam. I.e. 83% of the exercise points and 2.3 in exam = 1.7 final grade.
Data Mining & Machine Learning used a more complex scheme, there was theoretical sheets and practical sheets using Weka (the slides should be on electures server). '''Markus 14Dec09 9:23'''

Many exercise sheets contained two tasks and for each task we could get 1 point. That means that we could got 2 or  even 3 points per sheet. The exercises were optional, but because of the bonus points most students did the exercises. There was another reason to do the exercises. In this lecture the exam was very similar to the exercise sheet tasks, so we had two considerable advantages. In two other lectures we had to achieve 30 and 50% of the exercise points to participate at the exam.  '''Manuela 14Dec09 00:57'''

Now that I think about it, there were points for solving exercises in exercise-class. Also, there were more than 15 points achievable over the semester, but max. 15 were credited in the exam.  '''alex  13Dec09 23:57'''

In that lecture we could reach max. 15 bonus points for the exercises. Each exercise sheet scored one point + one extra point for specially good/clever solutions. I don't think there was a constraint on admission to the exam. There might have been points for solving exercises in the exercise-class, but I'm not sure about that. (--> already one year ago) '''alex  13Dec09 23:48'''

Thanks for the link, Alex, can you please explain the thing with the "bonus point of the exercise"? Is it that there were 15 bonus points to reach in the exercises? How many tasks in the exercises got bonus points and how many didn't? And the non-bonus tasks than only counted for admission to the exam? '''Hannah 13Dec09 23:36'''

I quite liked this scheme, although it does not put that much weight on the exercises: http://cone.informatik.uni-freiburg.de/teaching/vorlesung/algorithmentheorie-w08/exam.html '''alex  13Dec09 23:17'''

Can someone please post the details of one of the existing bonus systems here? (Something like: for xyz % of the exercises, you can improve your exam mark by abc.) Or the links to a course site where these details are given. Thanks! '''Hannah 13Dec09 23:09'''

To Mirko + all: Yes, it's d, I don't know how I've missed this one. '''Marjan  12Dec09 13:16'''

About Exercise 1: in the second part, is it really delta? shouldn't it be d? '''Mirko 12Dec09, 12:43'''