6586
Comment:
|
6097
|
Deletions are marked like this. | Additions are marked like this. |
Line 3: | Line 3: |
Here are PDFs of the slides of the lectures so far: [[attachment:SearchEnginesWS0910/lecture-1.pdf|Lecture 1]], [[attachment:SearchEnginesWS0910/lecture-2.pdf|Lecture 2]], [[attachment:SearchEnginesWS0910/lecture-3.pdf|Lecture 3]], [[attachment:SearchEnginesWS0910/lecture-4.pdf|Lecture 4]]. | Here are PDFs of the slides of the lectures so far: [[attachment:SearchEnginesWS0910/lecture-1.pdf|Lecture 1]], [[attachment:SearchEnginesWS0910/lecture-2.pdf|Lecture 2]], [[attachment:SearchEnginesWS0910/lecture-3.pdf|Lecture 3]], [[attachment:SearchEnginesWS0910/lecture-4.pdf|Lecture 4]], [[attachment:SearchEnginesWS0910/lecture-5.pdf|Lecture 5]], [[attachment:SearchEnginesWS0910/lecture-6.pdf|Lecture 6]], [[attachment:SearchEnginesWS0910/lecture-7.pdf|Lecture 7]], [[attachment:SearchEnginesWS0910/lecture-8.pdf|Lecture 8]], [[attachment:SearchEnginesWS0910/lecture-9.pdf|Lecture 9]], [[attachment:SearchEnginesWS0910/lecture-10.pdf|Lecture 10]]. |
Line 5: | Line 6: |
Here are .lpd files of the recordings of the lectures so far (except Lecture 2, where we had problems with the microphone): [[http://vulcano.informatik.uni-freiburg.de/lecturnity/lecture-1.lpd|Lecture 1]] [[http://vulcano.informatik.uni-freiburg.de/lecturnity/lecture-3.lpd|Lecture 3]] [[http://vulcano.informatik.uni-freiburg.de/lecturnity/lecture-4.lpd|Lecture 4]]. | Here are the recordings of the lectures so far (except Lecture 2, where we had problems with the microphone), LPD = Lecturnity recording: [[http://vulcano.informatik.uni-freiburg.de/lecturnity/lecture-1.lpd|Recording Lecture 1 (LPD)]], [[http://vulcano.informatik.uni-freiburg.de/lecturnity/lecture-3.lpd|Recording Lecture 3 (LPD)]], [[http://vulcano.informatik.uni-freiburg.de/lecturnity/lecture-4.lpd|Recording Lecture 4 (LPD)]], [[http://vulcano.informatik.uni-freiburg.de/lecturnity/lecture-5.lpd|Recording Lecture 5 (LPD without audio)]], [[http://vulcano.informatik.uni-freiburg.de/lecturnity/lecture-6.lpd|Recording Lecture 6 (LPD)]], [[http://vulcano.informatik.uni-freiburg.de/lecturnity/lecture-7.avi|Recording Lecture 7 (AVI)]], [[http://vulcano.informatik.uni-freiburg.de/lecturnity/lecture-8.avi|Recording Lecture 8 (AVI)]], [[http://vulcano.informatik.uni-freiburg.de/lecturnity/lecture-9.avi|Recording Lecture 9 (AVI)]], [[http://vulcano.informatik.uni-freiburg.de/lecturnity/lecture-10.avi|Recording Lecture 10 (AVI)]]. |
Line 7: | Line 8: |
Here are PDFs of the exercise sheets so far: [[attachment:SearchEnginesWS0910/exercise-1.pdf|Exercise Sheet 1]], [[attachment:SearchEnginesWS0910/exercise-2.pdf|Exercise Sheet 2]], [[attachment:SearchEnginesWS0910/exercise-3.pdf|Exercise Sheet 3]], [[attachment:SearchEnginesWS0910/exercise-4.pdf|Exercise Sheet 4]]. | Here are PDFs of the exercise sheets so far: [[attachment:SearchEnginesWS0910/exercise-1.pdf|Exercise Sheet 1]], [[attachment:SearchEnginesWS0910/exercise-2.pdf|Exercise Sheet 2]], [[attachment:SearchEnginesWS0910/exercise-3.pdf|Exercise Sheet 3]], [[attachment:SearchEnginesWS0910/exercise-4.pdf|Exercise Sheet 4]], [[attachment:SearchEnginesWS0910/exercise-5.pdf|Exercise Sheet 5]], [[attachment:SearchEnginesWS0910/exercise-6.pdf|Exercise Sheet 6]], [[attachment:SearchEnginesWS0910/exercise-7.pdf|Exercise Sheet 7]], [[attachment:SearchEnginesWS0910/exercise-8.pdf|Exercise Sheet 8]], [[attachment:SearchEnginesWS0910/exercise-9.pdf|Exercise Sheet 9]], [[attachment:SearchEnginesWS0910/exercise-10.pdf|Exercise Sheet 10]]. |
Line 9: | Line 10: |
Here are your solutions and comments on the previous exercise sheets: [[SearchEnginesWS0910/ExerciseSheet1|Solutions and Comments 1]], [[SearchEnginesWS0910/ExerciseSheet2|Solutions and Comments 2]], [[SearchEnginesWS0910/ExerciseSheet3|Solutions and Comments 3]] | Here are your solutions and comments on the previous exercise sheets: [[SearchEnginesWS0910/ExerciseSheet1|Solutions and Comments 1]], [[SearchEnginesWS0910/ExerciseSheet2|Solutions and Comments 2]], [[SearchEnginesWS0910/ExerciseSheet3|Solutions and Comments 3]], [[SearchEnginesWS0910/ExerciseSheet4|Solutions and Comments 4]], [[SearchEnginesWS0910/ExerciseSheet5|Solutions and Comments 5]], [[SearchEnginesWS0910/ExerciseSheet6|Solutions and Comments 6]], [[SearchEnginesWS0910/ExerciseSheet7|Solutions and Comments 7]], [[SearchEnginesWS0910/ExerciseSheet8|Solutions and Comments 8]], [[SearchEnginesWS0910/ExerciseSheet9|Solutions and Comments 9]]. |
Line 11: | Line 12: |
= Exercise Sheet 3 = The recordings of all lectures are now available, see above. Lecture 2 is missing because we had technical problems there. To play the recordings (it's .lpd files) you need the Lecturnity Player. [[http://www.lecturnity.de/de/download/lecturnity-player|You can download the player for free here]]. |
The recordings of all lectures are now available, see above. Lecture 2 is missing because we had technical problems there. To play the Lecturnity recordings (.lpd files) you need the [[http://www.lecturnity.de/de/download/lecturnity-player|Lecturnity Player, which you can download here]]. I put the Camtasia recordings as .avi files, which you can play with any ordinary video player; I would recommend [[http://www.videolan.org/vlc|VLC]]. |
Line 16: | Line 16: |
[[SearchEnginesWS0910/ExerciseSheet4|Here you can upload your solutions for Exercise Sheet 4]]. | [[SearchEnginesWS0910/MidTermExam|Here is everything about the mid-term exam]]. |
Line 18: | Line 18: |
== Questions or comments below this line, most recent on top please == | [[SearchEnginesWS0910/ExerciseSheet10|Here you can upload your solutions for Exercise Sheet 10]]. The deadline is Thursday 21Jan10 at 4 pm. |
Line 20: | Line 20: |
@Bit shifting: The syntax for that is actually the same, irrespectively of whether you use Java, C++, perl, python, or whatever. The >> operator shifts to the right, the << operator shifts to the left, the & operator ands the bits of the two operands and the | operator ors the bits of the two operands. Very simple. You will also find zillions of example programs on the web by typing something like ''java bit shifting'' into Google or whatever your favorite search engine is. '''Hannah 15Nov09 1:16''' | == Questions and comments about Exercise Sheet 10 below this line (most recent on top) == One Question: On slide 11 of lecture 10, shouldn't V be a kxn matrix instead of a nxk? '''Jonas 18.Jan10 20:39''' |
Line 22: | Line 23: |
Hi Marius + all: For Exercise 4, an inverted list of size m with doc ids from the range [1..n] is simply a sorted list of m numbers from the range [1..n]. I leave it to you, whether your lists potentially contain duplicates (as in 3, 5, 5, 8, 12, ...) or whether you generate them in a way that they don't contain duplicates (as in 3, 5, 8, 17, ...). It doesn't really matter for the exercise whether your list has duplicated or not. In any case, consider simple flat lists like in the two examples I gave (and like all the examples I gave in this and past lectures), not lists of lists or anything. '''Hannah 15Nov09 1:12am''' | Hi Jens + all: we applied this rule to your exercise sheets until the christmas break, that is, we simply took your worst sheet out of the counting. Given that there are only 4, at most 5, exercise sheets after the christmas break, the rule does not apply for those sheets. '''Hannah 18Jan10 18:19''' |
Line 24: | Line 25: |
@Mirko: Sure, but an inverted list is a list of words where the Doc-IDs are attached to each words in which the words occur. So for Example: If word no. 5 occurs in Doc1, Doc2 and Doc3 and word no. 2 occurs in Doc5, the list would look like: 5 -> Doc1, Doc2, Doc3; 2 -> Doc5. Or am I mistaken? My question then is, how long should these attached lists be in average case? I mean, one could imagine that we got 1mil. documents over 3 words, so these lists could get very large... | I have a question about skipping an exercise sheet: At the beginning of the semester you said that we would be allowed to do this once. Does it still hold? '''Jens 18Jan10 18:07''' |
Line 26: | Line 27: |
EDIT: Oh ok. Now, I see your point. It's not an index, it's a list. Okay. So, what is an inverted list with Doc-IDs, then? | Oh yes, sorry, I forgot, it's done now. '''Hannah 17Jan10 13:43''' |
Line 28: | Line 29: |
EDIT EDIT: And to your question, Mirko, take a look at [[http://snippets.dzone.com/posts/show/93]]. Especially at Comment no. 2. Maybe this helps... I think, Java supports StreamWriters/Readers that are able to write/read bytes. '''Marius 11/14/2009 08:46pm''' | '''Matthias''' - Can you please upload the Slides for the Lec 10 as well? |
Line 30: | Line 31: |
EDIT EDIT EDIT: Sorry, me again. Well, I bothered Wikipedia which redirects from [[http://en.wikipedia.org/wiki/Inverted_list]] to Inverted Index. So it seems to me, this is being used as a synonym. Actually, I think I'm confused enough, now. I'll better wait for any responses... ;-) '''Marius 11/14/2009 9:08 pm''' | Here is a major hint for Exercise 2. There are many ways to prove this, but one of the most natural is via these three steps, each of which is not too hard to prove. Note that, unlike in the lecture, in the hint below I use X' to denote the transpose of a matrix or vector X. This is a common notation in the numerics community (where as the pure algebra people prefer the T superscript). '''Hannah 14Jan10 23:37''' |
Line 32: | Line 33: |
@ Marius: i think we are supposed to generate one inverted __list__ of size m, with doc ids from 1..n (therefore n>=m, because no duplicates?). | {{{ (1) Prove that for an m x m matrix U with U' * U = I and for an arbitrary m x 1 vector x, the L2-norms of x and U*x are equal. The L2-norm of a vector x is defined as the square root of the sum of the squares of its components. It helps to observe that the square of the L2-norm of x can also be written as x' * x. |
Line 34: | Line 36: |
Now a question from my side: ex.4, programming the compression in __java__, is there any __good__ tutorial about how to handle the bit-stuff? (otherwise, i think, it would cost me too much time..) '''Mirko 14Nov09, 19:18''' | (2) Prove that for an m x m matrix U with U' * U = I and for an arbitrary m x n matrix A, the L2-norms of A and of U * A are equal. This is easy to prove using (1). |
Line 36: | Line 38: |
Hi, do you have any suggestions what the best numbers for m and n in exercise 4 should look like? Or are we supposed to mess around a bit with ints and longs? And: How long should the list of documents in the inverted index be? '''Marius 14Nov09 6:40pm''' And just to clarify what a single-cycle permutation is. Here is an example for an array of size 5 with a permutation that is a single cycle: 5 4 1 3 2. Why single cycle? Well, A[1] = 5, A[5] = 2, A[2] = 4, A[4] = 3, A[3] = 1. (My indices in this example are 1,...,5 and not 0,...,4.) Here is an example of a permutation with three cycles: 2 1 4 3 5. The first cycle is A[1] = 2, A[2] =1. The second cycle is A[3] = 4, A[4] = 3. The third cycle is A[5] = 5. '''Hannah 12Nov09 8:04pm''' Hi Daniel + all, I don't quite understand your question and your example (if your array is 1 5 3 4 2, why is A[1] = 3?). In case you refer to the requirement of the exercise that the permutation consists only of a single cycle. That is because your code should go over each element exactly once (it should, of course, stop after n iterations, where n is the size of the array). If your permutation has more than one cycle, it is hard to achieve that. Also note that for both (1) and (2), the sum of the array values should be sum_i=1,...,n i = n * (n+1) / 2. '''Hannah 12Nov09 7:54pm''' Hi, I just looked at the new exercise sheet 4, in exercise 1 we should generate a permutation and sum the resulting array up, am I wrong or doesn't iterating method two iterate throw the whole array in every situation. for ex.: n= 5 permutation: 1 5 3 4 2, then A[1] = 3, A[A[1]]= A[3] = 1, A[1] = 3 ... '''Daniel 12Nov09 19:44pm''' |
(3) Prove that if A has the singular value decomposition U * S * V', then the L2-norm of A is the same as the L2-norm of S. }}} |
Welcome to the Wiki page of the course Search Engines, WS 2009 / 2010. Lecturer: Hannah Bast. Tutorials: Marjan Celikik. Course web page: click here.
Here are PDFs of the slides of the lectures so far: Lecture 1, Lecture 2, Lecture 3, Lecture 4, Lecture 5, Lecture 6, Lecture 7, Lecture 8, Lecture 9, Lecture 10.
Here are the recordings of the lectures so far (except Lecture 2, where we had problems with the microphone), LPD = Lecturnity recording: Recording Lecture 1 (LPD), Recording Lecture 3 (LPD), Recording Lecture 4 (LPD), Recording Lecture 5 (LPD without audio), Recording Lecture 6 (LPD), Recording Lecture 7 (AVI), Recording Lecture 8 (AVI), Recording Lecture 9 (AVI), Recording Lecture 10 (AVI).
Here are PDFs of the exercise sheets so far: Exercise Sheet 1, Exercise Sheet 2, Exercise Sheet 3, Exercise Sheet 4, Exercise Sheet 5, Exercise Sheet 6, Exercise Sheet 7, Exercise Sheet 8, Exercise Sheet 9, Exercise Sheet 10.
Here are your solutions and comments on the previous exercise sheets: Solutions and Comments 1, Solutions and Comments 2, Solutions and Comments 3, Solutions and Comments 4, Solutions and Comments 5, Solutions and Comments 6, Solutions and Comments 7, Solutions and Comments 8, Solutions and Comments 9.
The recordings of all lectures are now available, see above. Lecture 2 is missing because we had technical problems there. To play the Lecturnity recordings (.lpd files) you need the Lecturnity Player, which you can download here. I put the Camtasia recordings as .avi files, which you can play with any ordinary video player; I would recommend VLC.
Here are the rules for the exercises as explained in Lecture 2.
Here is everything about the mid-term exam.
Here you can upload your solutions for Exercise Sheet 10. The deadline is Thursday 21Jan10 at 4 pm.
Questions and comments about Exercise Sheet 10 below this line (most recent on top)
One Question: On slide 11 of lecture 10, shouldn't V be a kxn matrix instead of a nxk? Jonas 18.Jan10 20:39
Hi Jens + all: we applied this rule to your exercise sheets until the christmas break, that is, we simply took your worst sheet out of the counting. Given that there are only 4, at most 5, exercise sheets after the christmas break, the rule does not apply for those sheets. Hannah 18Jan10 18:19
I have a question about skipping an exercise sheet: At the beginning of the semester you said that we would be allowed to do this once. Does it still hold? Jens 18Jan10 18:07
Oh yes, sorry, I forgot, it's done now. Hannah 17Jan10 13:43
Matthias - Can you please upload the Slides for the Lec 10 as well?
Here is a major hint for Exercise 2. There are many ways to prove this, but one of the most natural is via these three steps, each of which is not too hard to prove. Note that, unlike in the lecture, in the hint below I use X' to denote the transpose of a matrix or vector X. This is a common notation in the numerics community (where as the pure algebra people prefer the T superscript). Hannah 14Jan10 23:37
(1) Prove that for an m x m matrix U with U' * U = I and for an arbitrary m x 1 vector x, the L2-norms of x and U*x are equal. The L2-norm of a vector x is defined as the square root of the sum of the squares of its components. It helps to observe that the square of the L2-norm of x can also be written as x' * x. (2) Prove that for an m x m matrix U with U' * U = I and for an arbitrary m x n matrix A, the L2-norms of A and of U * A are equal. This is easy to prove using (1). (3) Prove that if A has the singular value decomposition U * S * V', then the L2-norm of A is the same as the L2-norm of S.