Diff for "SearchEnginesWS0910"

Differences between revisions 239 and 589 (spanning 350 versions)

Welcome to the Wiki page of the course Search Engines, WS 2009 / 2010. Lecturer: Hannah Bast. Tutorials: Marjan Celikik. Course web page: click here.

Here are PDFs of the slides of the lectures so far: Lecture 1, Lecture 2, Lecture 3, Lecture 4, Lecture 5, Lecture 6, Lecture 7, Lecture 8, Lecture 9, Lecture 10,Lecture 11.

Here are the recordings of the lectures so far (except Lecture 2, where we had problems with the microphone), LPD = Lecturnity recording: Recording Lecture 1 (LPD), Recording Lecture 3 (LPD), Recording Lecture 4 (LPD), Recording Lecture 5 (LPD without audio), Recording Lecture 6 (LPD), Recording Lecture 7 (AVI), Recording Lecture 8 (AVI), Recording Lecture 9 (AVI), Recording Lecture 10 (AVI).

Here are PDFs of the exercise sheets so far: Exercise Sheet 1, Exercise Sheet 2, Exercise Sheet 3, Exercise Sheet 4, Exercise Sheet 5, Exercise Sheet 6, Exercise Sheet 7, Exercise Sheet 8, Exercise Sheet 9, Exercise Sheet 10, Exercise Sheet 11.

Here are your solutions and comments on the previous exercise sheets: Solutions and Comments 1, Solutions and Comments 2, Solutions and Comments 3, Solutions and Comments 4, Solutions and Comments 5, Solutions and Comments 6, Solutions and Comments 7, Solutions and Comments 8, Solutions and Comments 9.

The recordings of all lectures are now available, see above. Lecture 2 is missing because we had technical problems there. To play the Lecturnity recordings (.lpd files) you need the Lecturnity Player, which you can download here. I put the Camtasia recordings as .avi files, which you can play with any ordinary video player; I would recommend VLC.

Here are the rules for the exercises as explained in Lecture 2.

Here is everything about the mid-term exam.

Here you can upload your solutions for Exercise Sheet 10. The deadline is Thursday 21Jan10 at 4 pm.

Questions and comments about Exercise Sheet 10 below this line (most recent on top)

Hi Manuela + all: I understand your point. I think that when one is familiar with basic linear algebra, then all the exercises (including Exercise 2, given my fairly strong and concrete hints) are something which you just sit down and do, no deep thinking required. But when one is not familiar, then yes, I can see that most of the time will be spend on understanding the meaning of basic things (which, I agree, is very important) like why can one write something like u * v', where u and v are vectors, and obtain a matrix. I guess I am constantly underestimating the mathematical background and exercise you received in you first semesters here in Freiburg. Anyway, I will take this into account when computing the marks from your points for the exercise sheets 9, 10, 11, etc. Note that also for the first 8 exercise sheets you could get a 1.0 without getting all the points, even after taking the worst sheet out of the counting. We will have something similar for the second half, too. So don't worry, it will be fair, and please continue to make an effort with the exercises, and continue to give me feedback when an exercise consumed way too much time, for whatever reason. Hannah 21Jan 17:48

Maybe it's only a problem for me that I can't sit down and start to prove f.e. exercise 2 or 3 immediately. I'm not familiar with linear algebra and it's difficult to understand the meaning of what we do. So before I can start I have to search for information and have to read what matrix norms and Frobenius norms and so on is. That's why it took much time for me to do exercise 2 and 3. Proving the hints (at the bottom of this page) is also nothing what I can do in five minutes. And for exercise 1 it was my own fault that I need much more time for it. I was confused and made some silly stuff. Of course it would be nice to have the bonus points for the exam, but it will be hard (and time consuming) to solve all tasks of all exercise sheets without gaps. Thanks for the hints and I think that the new bonus point system is much better than the old one. The only thing is that I'm not sure, if the "time calculation" is better than before. Maybe I'm just too slow. Manuela

To Björn at all: Yes, I see, I think the solution to an exercise like Exercise 1 is much faster to write on paper and then scan it in. Typesetting lots of matrices etc. in Latex is no fun and takes lots of time and shouldn't really be part of an exercise. Hannah 21Jan10 14:32

Yes, your last hint was very helpful. Thanks a lot. Sorry for the late response but I had to work for other courses first and it took me like 3 hours to put the other solutions into Latex (maybe this is also one reason why this sheet takes lots of time again. Especially Ex1 is okay to solve using applets/programs + copy&paste for all intermediate steps, but writing everything down, still takes ages). Now that I looked at exercise 2 again, your hint really helped. Björn 21Jan 13:03

It's there now. Hannah 21Jan10 00:59

The page to upload the solutions is still missing. Florian 21Jan10 00:02

To Florian: it's enough if you output the Eigenvector, but it would also be nice if you output an approximation of the eigenvalue. There is no unqiue procedure for doing this, however, and that's why I didn't include it formally in the exercise. The point is that what you will compute after a certain number of iterations is only almost an eigenvector, and how do you compute an approximate eigenvalue for something that is only almost an eigenvector. There are many ways to do it. One is to just take the quotient of each component of x_k and A*x_k, and then take the average of these components. Or the min and the max quotient and output both, that is probably more reasonable. Hannah 20Jan10 23:14

Please give me details on which exercise cost you how much time and why. Sorry, I couldn't respond earlier, I was working from home, and the internet connection to the university has been extremely bad all day long. Hannah 20Jan10 23:08

I think that I also can't complete the sheet this time (especially not in 6 hours) and it's a pity that not the worst sheet of all doesn't count. I know that some of us now have to priorize because in the beginning of February (and later) some exams will be written and also there will be seminars. That means that not the new bonus point system would be the reason for stopping after a certain time of solving tasks, but the general situation. Manuela 20Jan10 21:06

I used up my 6 hours for the exercise sheet and I couldn't complete it. What is the proceeding in this case? Johannes 2763-01-20T1934

Thanks for the additional hint for exercise 2, I think I finally managed to get a solution with it. Then I have another question to exercise 4: is it enough if the program outputs the eigenvector or should it also give the corresponding maximal eigenvalue? Florian 20Jan10 18:25

Hi Mirko: no, I meant what I wrote. You can easily check that it makes sense from the matrix / vector dimensions: u_i is an m x 1 vector, s_i is an 1 x 1 scalar, v_i is an n x 1 vector and hence v_i' (the transpose of v_i) is a 1 x n vector. Hence wrt to matrix dimensions u_i * s_i * v_i' is m x 1 * 1 x 1 * 1 x n, which matches as it should and gives an m x n matrix. Ok? Hannah 19Jan10 23:32

About the last post: with A = sum{i=1, .... you mean ||A|| = sum{i=1, ..., so the frobenius-norm of A? and by v_i' you mean the row of i of V'? If so, i don't get ||A|| = sum{i=1,...,r} u_i * s_i * v_i'. Or am i totally wrong? I'm just confused. Mirko, 19.01, 22:18

Hi Björn + all: let the SVD of A be U * S * V' and let u_1,...,u_r be the r columns of U, and let v_1,...,v_r be the r columns of V (or, equivalently, the r rows of V'), where r is the rank of the matrix. Let s_1,...,s_r be the r diagnonal entries of S. Then convince yourself that you can write A = sum_{i=1,...,r} u_i * s_i * v_i'. With that, you easily get the SVD of A_k and also of A - A_k. Tell me if this helped you. Hannah 19Jan10 17:32

Hey, is it possible that for exercise 2 it's ||A||-||A_k|| instead of ||A-A_k||? We've just discussed this because we both were stuck and the later seems more obvious to us and provable with the hints provided. Of course, it is possible that we're mistaken somewhere. We managed to proove the lemmas from the hint but still fail to proove the statement from the exercise. Otherwise, we might need a hint to show that ||A-A_k|| = ||S-S_k|| while S_k is S with the values s_ij where i,j>k set to 0. Bjoern 19.1. 17:18

Ok, I added the marks to your individual pages now. If a mark does not correspond to the assignment scheme in my post below, please tell us. Hannah 19Jan10 1:56am

Hi Florian + all: yes, you are right, a matrix-vector multiplication is all that is needed to implement the power method. Concerning you mark for the first eight exercise sheets, you can easily compute it from the following point range -> mark assigment: 28 - 30 points = 1.0, 26 - 27 points = 1.3, 24-25 points = 1.7, 22 - 23 points = 2.0, 20 - 21 points = 2.3, 18 - 19 points = 2.7, 16 - 17 points = 3.0, 14 - 15 points = 3.3, 12 - 13 points = 3.7, 10 - 11 points = 4.0. Your total number of points (with the worst exercise sheet taken out of the counting) is already on your page, I will add the corresponding marks now. Hannah 19Jan10 1:45am

Am I right with the assumption that we do not even need a matrix-matrix multiplication for exercise 4 but just a matrix-vector multiplication? And another question, where can I find the mark for the first 8 exercise sheets? Thanks. Florian 18.Jan10 22:38

Oh yes you're right. I forgot to transpose V. Sorry my mistake. Jonas 18.Jan10 21:49

Hi Jonas, no I think it's correct, V is n x k and then the transpose of V is k x n, and that is what appears in the product of the SVD. Hannah 18Jan10 21:39

One Question: On slide 11 of lecture 10, shouldn't V be a kxn matrix instead of a nxk? Jonas 18.Jan10 20:39

Hi Jens + all: we applied this rule to your exercise sheets until the christmas break, that is, we simply took your worst sheet out of the counting. Given that there are only 4, at most 5, exercise sheets after the christmas break, the rule does not apply for those sheets. Hannah 18Jan10 18:19

I have a question about skipping an exercise sheet: At the beginning of the semester you said that we would be allowed to do this once. Does it still hold? Jens 18Jan10 18:07

Oh yes, sorry, I forgot, it's done now. Hannah 17Jan10 13:43

Matthias - Can you please upload the Slides for the Lec 10 as well?

Here is a major hint for Exercise 2. There are many ways to prove this, but one of the most natural is via these three steps, each of which is not too hard to prove. Note that, unlike in the lecture, in the hint below I use X' to denote the transpose of a matrix or vector X. This is a common notation in the numerics community (where as the pure algebra people prefer the T superscript). Hannah 14Jan10 23:37

(1) Prove that for an m x m matrix U with U' * U = I and for an arbitrary m x 1 vector x, the L2-norms of x and U*x are equal. The L2-norm of a vector x is defined as the square root of the sum of the squares of its components. It helps to observe that the square of the L2-norm of x can also be written as x' * x.

(2) Prove that for an m x m matrix U with U' * U = I and for an arbitrary m x n matrix A, the L2-norms of A and of U * A are equal. This is easy to prove using (1).

(3) Prove that if A has the singular value decomposition U * S * V', then the L2-norm of A is the same as the L2-norm of S.

-  ⇤ ← Revision 239 as of 2009-11-12 20:05:06 → 
  Size: 10505
  Editor: Hannah Bast
  Comment:
+   ← Revision 589 as of 2010-01-21 17:51:25 → ⇥
  Size: 14221
  Editor: Hannah Bast
  Comment:
-Deletions are marked like this.
+Additions are marked like this.
 Line 3:
-Here are PDFs of the slides of the lectures so far: [[attachment:SearchEnginesWS0910/lecture-1.pdf|Lecture 1]], [[attachment:SearchEnginesWS0910/lecture-2.pdf|Lecture 2]], [[attachment:SearchEnginesWS0910/lecture-3.pdf|Lecture 3]], [[attachment:SearchEnginesWS0910/lecture-4.pdf|Lecture 4]].
+Here are PDFs of the slides of the lectures so far: [[attachment:SearchEnginesWS0910/lecture-1.pdf|Lecture 1]], [[attachment:SearchEnginesWS0910/lecture-2.pdf|Lecture 2]], [[attachment:SearchEnginesWS0910/lecture-3.pdf|Lecture 3]], [[attachment:SearchEnginesWS0910/lecture-4.pdf|Lecture 4]], [[attachment:SearchEnginesWS0910/lecture-5.pdf|Lecture 5]], [[attachment:SearchEnginesWS0910/lecture-6.pdf|Lecture 6]], [[attachment:SearchEnginesWS0910/lecture-7.pdf|Lecture 7]], [[attachment:SearchEnginesWS0910/lecture-8.pdf|Lecture 8]], [[attachment:SearchEnginesWS0910/lecture-9.pdf|Lecture 9]], [[attachment:SearchEnginesWS0910/lecture-10.pdf|Lecture 10]],[[attachment:SearchEnginesWS0910/lecture-11.pdf|Lecture 11]].
 Line 5:
-Here are the recordings of some of the lectures so far (Lecture 1 still missing, in Lecture 2 the microphone signal did not come through): [[http://vulcano.informatik.uni-freiburg.de/lecturnity/lecture1/Search_Engines,_Lecture_3,_5Nov09_1_05_11_2009_16_16_20.html|Lecture 3]]
-Line 7:
+Line 6:
-Here are PDFs of the exercise sheets so far: [[attachment:SearchEnginesWS0910/exercise-1.pdf|Exercise Sheet 1]], [[attachment:SearchEnginesWS0910/exercise-2.pdf|Exercise Sheet 2]], [[attachment:SearchEnginesWS0910/exercise-3.pdf|Exercise Sheet 3]], [[attachment:SearchEnginesWS0910/exercise-4.pdf|Exercise Sheet 4]].
+Here are the recordings of the lectures so far (except Lecture 2, where we had problems with the microphone), LPD = Lecturnity recording: [[http://vulcano.informatik.uni-freiburg.de/lecturnity/lecture-1.lpd|Recording Lecture 1 (LPD)]], [[http://vulcano.informatik.uni-freiburg.de/lecturnity/lecture-3.lpd|Recording Lecture 3 (LPD)]], [[http://vulcano.informatik.uni-freiburg.de/lecturnity/lecture-4.lpd|Recording Lecture 4 (LPD)]], [[http://vulcano.informatik.uni-freiburg.de/lecturnity/lecture-5.lpd|Recording Lecture 5 (LPD without audio)]], [[http://vulcano.informatik.uni-freiburg.de/lecturnity/lecture-6.lpd|Recording Lecture 6 (LPD)]], [[http://vulcano.informatik.uni-freiburg.de/lecturnity/lecture-7.avi|Recording Lecture 7 (AVI)]], [[http://vulcano.informatik.uni-freiburg.de/lecturnity/lecture-8.avi|Recording Lecture 8 (AVI)]], [[http://vulcano.informatik.uni-freiburg.de/lecturnity/lecture-9.avi|Recording Lecture 9 (AVI)]], [[http://vulcano.informatik.uni-freiburg.de/lecturnity/lecture-10.avi|Recording Lecture 10 (AVI)]].
-Line 9:
+Line 8:
-Here are your solutions and comments on the previous exercise sheets: [[SearchEnginesWS0910/ExerciseSheet1|Exercise Sheet 1]], [[SearchEnginesWS0910/ExerciseSheet2|Exercise Sheet 2]].
+Here are PDFs of the exercise sheets so far: [[attachment:SearchEnginesWS0910/exercise-1.pdf|Exercise Sheet 1]], [[attachment:SearchEnginesWS0910/exercise-2.pdf|Exercise Sheet 2]], [[attachment:SearchEnginesWS0910/exercise-3.pdf|Exercise Sheet 3]], [[attachment:SearchEnginesWS0910/exercise-4.pdf|Exercise Sheet 4]], [[attachment:SearchEnginesWS0910/exercise-5.pdf|Exercise Sheet 5]], [[attachment:SearchEnginesWS0910/exercise-6.pdf|Exercise Sheet 6]], [[attachment:SearchEnginesWS0910/exercise-7.pdf|Exercise Sheet 7]], [[attachment:SearchEnginesWS0910/exercise-8.pdf|Exercise Sheet 8]], [[attachment:SearchEnginesWS0910/exercise-9.pdf|Exercise Sheet 9]], [[attachment:SearchEnginesWS0910/exercise-10.pdf|Exercise Sheet 10]], [[attachment:SearchEnginesWS0910/exercise-11.pdf|Exercise Sheet 11]].

Here are your solutions and comments on the previous exercise sheets: [[SearchEnginesWS0910/ExerciseSheet1|Solutions and Comments 1]], [[SearchEnginesWS0910/ExerciseSheet2|Solutions and Comments 2]], [[SearchEnginesWS0910/ExerciseSheet3|Solutions and Comments 3]], [[SearchEnginesWS0910/ExerciseSheet4|Solutions and Comments 4]], [[SearchEnginesWS0910/ExerciseSheet5|Solutions and Comments 5]], [[SearchEnginesWS0910/ExerciseSheet6|Solutions and Comments 6]], [[SearchEnginesWS0910/ExerciseSheet7|Solutions and Comments 7]], [[SearchEnginesWS0910/ExerciseSheet8|Solutions and Comments 8]], [[SearchEnginesWS0910/ExerciseSheet9|Solutions and Comments 9]].

The recordings of all lectures are now available, see above. Lecture 2 is missing because we had technical problems there. To play the Lecturnity recordings (.lpd files) you need the [[http://www.lecturnity.de/de/download/lecturnity-player|Lecturnity Player, which you can download here]]. I put the Camtasia recordings as .avi files, which you can play with any ordinary video player; I would recommend [[http://www.videolan.org/vlc|VLC]].
-Line 13:
+Line 16:
-= Exercise Sheet 3 =
Above, you find a link to a published recording of Lecture 3. Please try if that works for you.
+[[SearchEnginesWS0910/MidTermExam|Here is everything about the mid-term exam]].
-Line 16:
+Line 18:
-[[SearchEnginesWS0910/ExerciseSheet3|Here you can upload your solutions for Exercise Sheet 3]].
+[[SearchEnginesWS0910/ExerciseSheet10|Here you can upload your solutions for Exercise Sheet 10]]. The deadline is Thursday 21Jan10 at 4 pm.
-Line 18:
+Line 20:
-== Questions or comments below this line, most recent on top please ==
+== Questions and comments about Exercise Sheet 10 below this line (most recent on top) ==
-Line 20:
+Line 22:
-And just to clarify what a single-cycle permutation is. Here is an example for an array of size 5 with a permutation that is a single cycle: 5 4 1 3 2. Why single cycle? Well, A[1] = 5, A[5] = 2, A[2] = 4, A[4] = 3, A[3] = 1. (My indices in this example are 1,...,5 and not 0,...,4.) Here is an example of a permutation with three cycles: 2 1 4 3 5. The first cycle is A[1] = 2, A[2] =1. The second cycle is A[3] = 4, A[4] = 3. The third cycle is A[5] = 5. '''Hannah 12Nov09 8:04pm'''
+Hi Manuela + all: I understand your point. I think that when one is familiar with basic linear algebra, then all the exercises (including Exercise 2, given my fairly strong and concrete hints) are something which you just sit down and do, no deep thinking required. But when one is not familiar, then yes, I can see that most of the time will be spend on understanding the meaning of basic things (which, I agree, is very important) like why can one write something like u * v', where u and v are vectors, and obtain a matrix. I guess I am constantly underestimating the mathematical background and exercise you received in you first semesters here in Freiburg. Anyway, I will take this into account when computing the marks from your points for the exercise sheets 9, 10, 11, etc. Note that also for the first 8 exercise sheets you could get a 1.0 without getting all the points, even after taking the worst sheet out of the counting. We will have something similar for the second half, too. So don't worry, it will be fair, and please continue to make an effort with the exercises, and continue to give me feedback when an exercise consumed way too much time, for whatever reason. '''Hannah 21Jan 17:48'''
-Line 22:
+Line 24:
-Hi Daniel + all, I don't quite understand your question and your example (if your array is 1 5 3 4 2, why is A[1] = 3?). In case you refer to the requirement of the exercise that the permutation consists only of a single cycle. That is because your code should go over each element exactly once (it should, of course, stop after n iterations, where n is the size of the array). If your permutation has more than one cycle, it is hard to achieve that. Also note that for both (1) and (2), the sum of the array values should be sum_i=1,...,n i = n * (n+1) / 2. '''Hannah 12Nov09 7:54pm'''
+Maybe it's only a problem for me that I can't sit down and start to prove f.e. exercise 2 or 3 immediately. I'm not familiar with linear algebra and it's difficult to understand the meaning of what we do. So before I can start I have to search for information and have to read what matrix norms and Frobenius norms and so on is. That's why it took much time for me to do exercise 2 and 3. Proving the hints (at the bottom of this page) is also nothing what I can do in five minutes. And for exercise 1 it was my own fault that I need much more time for it. I was confused and made some silly stuff. Of course it would be nice to have the bonus points for the exam, but it will be hard (and time consuming) to solve all tasks of all exercise sheets without gaps. Thanks for the hints and I think that the new bonus point system is much better than the old one. The only thing is that I'm not sure, if the "time calculation" is better than before. Maybe I'm just too slow. '''Manuela'''
-Line 24:
+Line 26:
-Hi, I just looked at the new exercise sheet 4, in exercise 1 we should generate a permutation and sum the resulting array up, am I wrong or doesn't iterating method two iterate throw the whole array in every situation. for ex.: n= 5 permutation: 1 5 3 4 2, then A[1] = 3, A[A[1]]= A[3] = 1, A[1] = 3 ... '''Daniel 12Nov09 19:44pm'''
+To Björn at all: Yes, I see, I think the solution to an exercise like Exercise 1 is much faster to write on paper and then scan it in. Typesetting lots of matrices etc. in Latex is no fun and takes lots of time and shouldn't really be part of an exercise. '''Hannah 21Jan10 14:32'''
-Line 26:
+Line 28:
-Hi Paresh + all. If you do and-ish retrieval, all documents which contain at least one of the query words are relevant. If you do and retrieval, only the documents which contain all of the query words are relevant. Independent of which scoring scheme you use, you can do it either way. Just clearly state which way you are doing it, and things will be fine. '''Hannah 10Nov09 2:46am'''
+Yes, your last hint was very helpful. Thanks a lot. Sorry for the late response but I had to work for other courses first and it took me like 3 hours to put the other solutions into Latex (maybe this is also one reason why this sheet takes lots of time again. Especially Ex1 is okay to solve using applets/programs + copy&paste for all intermediate steps, but writing everything down, still takes ages). Now that I looked at exercise 2 again, your hint really helped. '''Björn 21Jan 13:03'''
-Line 28:
+Line 30:
-Hi, If i use and-ish retrieval how is the relevance of hits is measured? The hits which have one of the keywords are relevant or only those who have all the keywords are relevant. I am confused!! '''Paresh 10 Nov 1:31AM '''
+It's there now. '''Hannah 21Jan10 00:59'''
-Line 30:
+Line 32:
-Hi Dragos + all, pick three documents which got a score of zero. For any reasonable query and a not too small collection there should be such documents. '''Hannah 10Nov09 0:24am'''
+The page to upload the solutions is still missing. '''Florian 21Jan10 00:02'''
-Line 32:
+Line 34:
-Hello :). For exercise 3, should we find three-relevant documents that were not returned at all, or that were low ranked ? I guess that the goal is to discuss why the low-rank, right ? :) '''Dragos 9Nov 11.56 PM'''
 
Hi Dragos, if you want you can use your implementation from Exercise 1 (extended to deal with scores, of course). If your implementation could only deal with two-word queries, you have to find a suitable two-word query (see what I wrote below), but that should be possible. I don't understand what this has to do with the vector-space model though? The vector space model is just a way to get a formula for ranking documents. Processing the queries is still done via inverted lists. Understand that inverted lists are just an efficient way to store the non-zero entries of a sparse matrix. You would never store the term-document matrix as a two-dimensional array, that would be huge ... and a big waste, because most entries are actually zero (that's what "sparse" means). '''Hannah 9Nov09 2:31am'''
+To Florian: it's enough if you output the Eigenvector, but it would also be nice if you output an approximation of the eigenvalue. There is no unqiue procedure for doing this, however, and that's why I didn't include it formally in the exercise. The point is that what you will compute after a certain number of iterations is only ''almost'' an eigenvector, and how do you compute an approximate eigenvalue for something that is only almost an eigenvector. There are many ways to do it. One is to just take the quotient of each component of x_k and A*x_k, and then take the average of these components. Or the min and the max quotient and output both, that is probably more reasonable. '''Hannah 20Jan10 23:14'''
 Line 36:
-For exercises 2 and 3, should the query be multi-word, or the exercises refer to the two-word query implemented in Ex Sheet 1 ? I am asking because it's clearly more simple to use the "easy" method(presented in the lecture) for two-word queries and the "vector space model" for a multi-word query. Thank you in advance ! '''Dragos 9Nov 00.27 '''
+Please give me details on which exercise cost you how much time and why. Sorry, I couldn't respond earlier, I was working from home, and the internet connection to the university has been extremely bad all day long. '''Hannah 20Jan10 23:08'''
 Line 38:
-To Björn + all: I said non-trivial so that you don't take a super-specific query, which has only one or two matching documents, which you can easily retrieve with 100% precision and 100% recall with the right query. An example would be an article with a very specific title like "On the influence of Blancmanges from Skyron on Scottish tennis playing skills". Then, if your collection is not super large, the query "blancmanges skyron scottish" will be perfect. Don't pick a query like that. Also do not just formulate a query, but also write down the search request that you had in mind, so that you have a yardstick to determine what is relevant for that query and what is not. [[attachment:trec_queries.txt|Here are some example of queries from TREC, the big IR benchmark conference]]. Each query there has a so-called "title" (what you would typically enter as query words), a "description" (a short description of what the query is about), and a "narrative" (a long description of what the query is about). '''Hannah 08Nov09 3:28pm'''
+I think that I also can't complete the sheet this time (especially not in 6 hours) and it's a pity that not the worst sheet of all doesn't count. I know that some of us now have to priorize because in the beginning of February (and later) some exams will be written and also there will be seminars. That means that not the new bonus point system would be the reason for stopping after a certain time of solving tasks, but the general situation. '''Manuela 20Jan10 21:06'''
 Line 40:
-Concerning the exercises: What is a non-trivial query? A query that does contain multiple documents or does it have to consist of multiple words, too? Anything else? '''Björn 8Nov09 14:26'''
+I used up my 6 hours for the exercise sheet and I couldn't complete it. What is the proceeding in this case? '''Johannes 2763-01-20T1934'''
 Line 42:
-The recording works for mac os with flip4mac: http://www.microsoft.com/windows/windowsmedia/player/wmcomponents.mspx  '''Jonas 8Nov09 14:06'''
+Thanks for the additional hint for exercise 2, I think I finally managed to get a solution with it. Then I have another question to exercise 4: is it enough if the program outputs the eigenvector or should it also give the corresponding maximal eigenvalue? '''Florian 20Jan10 18:25'''
 Line 44:
-Maybe you could consider putting it on electures - it seems to be the standard place for putting recordings and slides online as far as I know and there are already some solutions for putting lecturnity files online in a platform-independent manner. I don't know if it's practical but it's also nice having all lectures in one place i'd say...  http://electures.informatik.uni-freiburg.de/  '''Alex 7Nov09 18:05'''
+Hi Mirko: no, I meant what I wrote. You can easily check that it makes sense from the matrix / vector dimensions: u_i is an m x 1 vector, s_i is an 1 x 1 scalar, v_i is an n x 1 vector and hence v_i' (the transpose of v_i) is a 1 x n vector. Hence wrt to matrix dimensions u_i * s_i * v_i' is m x 1 * 1 x 1 * 1 x n, which matches as it should and gives an m x n matrix. Ok? '''Hannah 19Jan10 23:32'''
 Line 46:
-In Linux I see no suitable plugin. I would like to download the .lpd file too. We can test it with our old Lecturnity versions (i have 2.0) and if it doesn't work we can download 4.x at http://www.lecturnity.de/de/download/lecturnity-player/ '''Waldemar 7Nov09 12:49'''
+About the last post: with A = sum{i=1, .... you mean ||A|| = sum{i=1, ..., so the frobenius-norm of A? and by v_i' you mean the row of i of V'? If so, i don't get ||A|| = sum{i=1,...,r} u_i * s_i * v_i'. Or am i totally wrong? I'm just confused. '''Mirko, 19.01, 22:18'''
 Line 48:
-Thanks, Paresh, yes I can do that. So do all students have access to the latest version (should be at least 4.x otherwise it will not work I think) of Lecturnity? '''Hannah 6Nov09 11:35pm'''
+Hi Björn + all: let the SVD of A be U * S * V' and let u_1,...,u_r be the r columns of U, and let v_1,...,v_r be the r columns of V (or, equivalently, the r rows of V'), where r is the rank of the matrix. Let s_1,...,s_r be the r diagnonal entries of S. Then convince yourself that you can write A = sum_{i=1,...,r} u_i * s_i * v_i'. With that, you easily get the SVD of A_k and also of A - A_k. Tell me if this helped you. '''Hannah 19Jan10 17:32'''
 Line 50:
-Hi, yes the recording is working properly after downloading the plug-in. kindly upload the rest files. Also it will be helpful if you could give links to .lpd files since it is easier to download and play them in lecturnity player than browser and one can play them at any time. '''Paresh 6 Nov09 11:25pm'''
+Hey, is it possible that for exercise 2 it's ||A||-||A_k|| instead of ||A-A_k||? We've just discussed this because we both were stuck and the later seems more obvious to us and provable with the hints provided. Of course, it is possible that we're mistaken somewhere. We managed to proove the lemmas from the hint but still fail to proove the statement from the exercise. Otherwise, we might need a hint to show that ||A-A_k|| = ||S-S_k|| while S_k is S with the values s_ij where i,j>k set to 0. '''Bjoern 19.1. 17:18'''
 Line 52:
-To Mirko + all: whenever we write "prove", we mean a proof in the mathematical sense. For the exercises, the challenge is often two-fold. You first have to turn the statement of the exercise into a formal statement. Then you have to prove that statement. For Exercise 4 you will first have to specify the order in which the inverted lists should be sorted. Then you have to prove that the document with the i-th largest score (formed by max aggregation), where i <= k, is indeed among one of the k first entries wrt to the specified order, in at least one of the inverted lists. '''Hannah 3Nov09 10:29pm'''
+Ok, I added the marks to your individual pages now. If a mark does not correspond to the assignment scheme in my post below, please tell us. '''Hannah 19Jan10 1:56am'''
 Line 54:
-About Exercise4: I actually dont know how to to write down (but i think i know how/why it works) the prove of top-k retrieval with the maximum-score. Is it okay to describe it in words or do we have to formalize it in a certain way?  '''Mirko 5Nov09 22:21pm'''
+Hi Florian + all: yes, you are right, a matrix-vector multiplication is all that is needed to implement the power method. Concerning you mark for the first eight exercise sheets, you can easily compute it from the following point range -> mark assigment: 28 - 30 points = 1.0, 26 - 27 points = 1.3, 24-25 points = 1.7, 22 - 23 points = 2.0, 20 - 21 points = 2.3, 18 - 19 points = 2.7, 16 - 17 points = 3.0, 14 - 15 points = 3.3, 12 - 13 points = 3.7, 10 - 11 points = 4.0. Your total number of points (with the worst exercise sheet taken out of the counting) is already on your page, I will add the corresponding marks now. '''Hannah 19Jan10 1:45am'''
 Line 56:
-Ok, I have played around a bit with lecturnity myself, and published Lecture 3, see the link above. For Marjan it worked, he only needed to install some Windows Media plugin for his Firefox. Please also try, and tell me if there are problems. Also tell me if everything goes fine. (It's enough if one or two people tell me.) If it does I will also publish Lecture 1. Lecture 2, as I said, is lost to the world forever (well, at least the audio), since audio recording did not work that day. '''Hannah 3Nov09 10:06pm'''
+Am I right with the assumption that we do not even need a matrix-matrix multiplication for exercise 4 but just a matrix-vector multiplication? And another question, where can I find the mark for the first 8 exercise sheets? Thanks. '''Florian 18.Jan10 22:38'''
 Line 58:
-Dear Marius + all: Yes, the lectures are recorded, except for Lecture 2, where there were technical problems (no signal from the microphone). I always copy the Lecturnity files to my machine after the lecture, but don't know yet how how to publish them on the web so that they are easily viewable by others. I will meet with our group's technician tomorrow, and ask him about this. Stay tuned! '''Hannah 5Nov09 8:36pm'''
+Oh yes you're right. I forgot to transpose V. Sorry my mistake. '''Jonas 18.Jan10 21:49'''
 Line 60:
-Hi, I noticed that you record your lectures. Is it somehow possible to download these recordings or will they be released later? '''Marius Nov5th, 4:54 p.m.'''
+Hi Jonas, no I think it's correct, V is n x k and then the transpose of V is k x n, and that is what appears in the product of the SVD. '''Hannah 18Jan10 21:39'''
 Line 62:
-Hi Waleed, when you create a conflict, it's your responsibility to remove it and not leave a mess behind. If the instructions given when the conflict occurs do not suffice, try to find more information on the Wiki help pages. '''Hannah 3Nov09 9:00pm'''
+One Question: On slide 11 of lecture 10, shouldn't V be a kxn matrix instead of a nxk? '''Jonas 18.Jan10 20:39'''
 Line 64:
-I uploaded my Files and put a new row on table in the excercies sheet 2 page but when i pressed save button it shows me conflict. my version and other version of list. how can i remove conflict? does my assignment is submitted properly or not? '''Waleed''' 3Nov09
+Hi Jens + all: we applied this rule to your exercise sheets until the christmas break, that is, we simply took your worst sheet out of the counting. Given that there are only 4, at most 5, exercise sheets after the christmas break, the rule does not apply for those sheets. '''Hannah 18Jan10 18:19'''

I have a question about skipping an exercise sheet: At the beginning of the semester you said that we would be allowed to do this once. Does it still hold? '''Jens 18Jan10 18:07'''

Oh yes, sorry, I forgot, it's done now. '''Hannah 17Jan10 13:43'''

'''Matthias''' - Can you please upload the Slides for the Lec 10 as well?

Here is a major hint for Exercise 2. There are many ways to prove this, but one of the most natural is via these three steps, each of which is not too hard to prove. Note that, unlike in the lecture, in the hint below I use X' to denote the transpose of a matrix or vector X. This is a common notation in the numerics community (where as the pure algebra people prefer the T superscript). '''Hannah 14Jan10 23:37'''

{{{
(1) Prove that for an m x m matrix U with U' * U = I and for an arbitrary m x 1 vector x, the L2-norms of x and U*x are equal. The L2-norm of a vector x is defined as the square root of the sum of the squares of its components. It helps to observe that the square of the L2-norm of x can also be written as x' * x.

(2) Prove that for an m x m matrix U with U' * U = I and for an arbitrary m x n matrix A, the L2-norms of A and of U * A are equal. This is easy to prove using (1).

(3) Prove that if A has the singular value decomposition U * S * V', then the L2-norm of A is the same as the L2-norm of S.
}}}