Diff for "SearchEnginesWS0910"

Differences between revisions 291 and 589 (spanning 298 versions)

Welcome to the Wiki page of the course Search Engines, WS 2009 / 2010. Lecturer: Hannah Bast. Tutorials: Marjan Celikik. Course web page: click here.

Here are PDFs of the slides of the lectures so far: Lecture 1, Lecture 2, Lecture 3, Lecture 4, Lecture 5, Lecture 6, Lecture 7, Lecture 8, Lecture 9, Lecture 10,Lecture 11.

Here are the recordings of the lectures so far (except Lecture 2, where we had problems with the microphone), LPD = Lecturnity recording: Recording Lecture 1 (LPD), Recording Lecture 3 (LPD), Recording Lecture 4 (LPD), Recording Lecture 5 (LPD without audio), Recording Lecture 6 (LPD), Recording Lecture 7 (AVI), Recording Lecture 8 (AVI), Recording Lecture 9 (AVI), Recording Lecture 10 (AVI).

Here are PDFs of the exercise sheets so far: Exercise Sheet 1, Exercise Sheet 2, Exercise Sheet 3, Exercise Sheet 4, Exercise Sheet 5, Exercise Sheet 6, Exercise Sheet 7, Exercise Sheet 8, Exercise Sheet 9, Exercise Sheet 10, Exercise Sheet 11.

Here are your solutions and comments on the previous exercise sheets: Solutions and Comments 1, Solutions and Comments 2, Solutions and Comments 3, Solutions and Comments 4, Solutions and Comments 5, Solutions and Comments 6, Solutions and Comments 7, Solutions and Comments 8, Solutions and Comments 9.

The recordings of all lectures are now available, see above. Lecture 2 is missing because we had technical problems there. To play the Lecturnity recordings (.lpd files) you need the Lecturnity Player, which you can download here. I put the Camtasia recordings as .avi files, which you can play with any ordinary video player; I would recommend VLC.

Here are the rules for the exercises as explained in Lecture 2.

Here is everything about the mid-term exam.

Here you can upload your solutions for Exercise Sheet 10. The deadline is Thursday 21Jan10 at 4 pm.

Questions and comments about Exercise Sheet 10 below this line (most recent on top)

Hi Manuela + all: I understand your point. I think that when one is familiar with basic linear algebra, then all the exercises (including Exercise 2, given my fairly strong and concrete hints) are something which you just sit down and do, no deep thinking required. But when one is not familiar, then yes, I can see that most of the time will be spend on understanding the meaning of basic things (which, I agree, is very important) like why can one write something like u * v', where u and v are vectors, and obtain a matrix. I guess I am constantly underestimating the mathematical background and exercise you received in you first semesters here in Freiburg. Anyway, I will take this into account when computing the marks from your points for the exercise sheets 9, 10, 11, etc. Note that also for the first 8 exercise sheets you could get a 1.0 without getting all the points, even after taking the worst sheet out of the counting. We will have something similar for the second half, too. So don't worry, it will be fair, and please continue to make an effort with the exercises, and continue to give me feedback when an exercise consumed way too much time, for whatever reason. Hannah 21Jan 17:48

Maybe it's only a problem for me that I can't sit down and start to prove f.e. exercise 2 or 3 immediately. I'm not familiar with linear algebra and it's difficult to understand the meaning of what we do. So before I can start I have to search for information and have to read what matrix norms and Frobenius norms and so on is. That's why it took much time for me to do exercise 2 and 3. Proving the hints (at the bottom of this page) is also nothing what I can do in five minutes. And for exercise 1 it was my own fault that I need much more time for it. I was confused and made some silly stuff. Of course it would be nice to have the bonus points for the exam, but it will be hard (and time consuming) to solve all tasks of all exercise sheets without gaps. Thanks for the hints and I think that the new bonus point system is much better than the old one. The only thing is that I'm not sure, if the "time calculation" is better than before. Maybe I'm just too slow. Manuela

To Björn at all: Yes, I see, I think the solution to an exercise like Exercise 1 is much faster to write on paper and then scan it in. Typesetting lots of matrices etc. in Latex is no fun and takes lots of time and shouldn't really be part of an exercise. Hannah 21Jan10 14:32

Yes, your last hint was very helpful. Thanks a lot. Sorry for the late response but I had to work for other courses first and it took me like 3 hours to put the other solutions into Latex (maybe this is also one reason why this sheet takes lots of time again. Especially Ex1 is okay to solve using applets/programs + copy&paste for all intermediate steps, but writing everything down, still takes ages). Now that I looked at exercise 2 again, your hint really helped. Björn 21Jan 13:03

It's there now. Hannah 21Jan10 00:59

The page to upload the solutions is still missing. Florian 21Jan10 00:02

To Florian: it's enough if you output the Eigenvector, but it would also be nice if you output an approximation of the eigenvalue. There is no unqiue procedure for doing this, however, and that's why I didn't include it formally in the exercise. The point is that what you will compute after a certain number of iterations is only almost an eigenvector, and how do you compute an approximate eigenvalue for something that is only almost an eigenvector. There are many ways to do it. One is to just take the quotient of each component of x_k and A*x_k, and then take the average of these components. Or the min and the max quotient and output both, that is probably more reasonable. Hannah 20Jan10 23:14

Please give me details on which exercise cost you how much time and why. Sorry, I couldn't respond earlier, I was working from home, and the internet connection to the university has been extremely bad all day long. Hannah 20Jan10 23:08

I think that I also can't complete the sheet this time (especially not in 6 hours) and it's a pity that not the worst sheet of all doesn't count. I know that some of us now have to priorize because in the beginning of February (and later) some exams will be written and also there will be seminars. That means that not the new bonus point system would be the reason for stopping after a certain time of solving tasks, but the general situation. Manuela 20Jan10 21:06

I used up my 6 hours for the exercise sheet and I couldn't complete it. What is the proceeding in this case? Johannes 2763-01-20T1934

Thanks for the additional hint for exercise 2, I think I finally managed to get a solution with it. Then I have another question to exercise 4: is it enough if the program outputs the eigenvector or should it also give the corresponding maximal eigenvalue? Florian 20Jan10 18:25

Hi Mirko: no, I meant what I wrote. You can easily check that it makes sense from the matrix / vector dimensions: u_i is an m x 1 vector, s_i is an 1 x 1 scalar, v_i is an n x 1 vector and hence v_i' (the transpose of v_i) is a 1 x n vector. Hence wrt to matrix dimensions u_i * s_i * v_i' is m x 1 * 1 x 1 * 1 x n, which matches as it should and gives an m x n matrix. Ok? Hannah 19Jan10 23:32

About the last post: with A = sum{i=1, .... you mean ||A|| = sum{i=1, ..., so the frobenius-norm of A? and by v_i' you mean the row of i of V'? If so, i don't get ||A|| = sum{i=1,...,r} u_i * s_i * v_i'. Or am i totally wrong? I'm just confused. Mirko, 19.01, 22:18

Hi Björn + all: let the SVD of A be U * S * V' and let u_1,...,u_r be the r columns of U, and let v_1,...,v_r be the r columns of V (or, equivalently, the r rows of V'), where r is the rank of the matrix. Let s_1,...,s_r be the r diagnonal entries of S. Then convince yourself that you can write A = sum_{i=1,...,r} u_i * s_i * v_i'. With that, you easily get the SVD of A_k and also of A - A_k. Tell me if this helped you. Hannah 19Jan10 17:32

Hey, is it possible that for exercise 2 it's ||A||-||A_k|| instead of ||A-A_k||? We've just discussed this because we both were stuck and the later seems more obvious to us and provable with the hints provided. Of course, it is possible that we're mistaken somewhere. We managed to proove the lemmas from the hint but still fail to proove the statement from the exercise. Otherwise, we might need a hint to show that ||A-A_k|| = ||S-S_k|| while S_k is S with the values s_ij where i,j>k set to 0. Bjoern 19.1. 17:18

Ok, I added the marks to your individual pages now. If a mark does not correspond to the assignment scheme in my post below, please tell us. Hannah 19Jan10 1:56am

Hi Florian + all: yes, you are right, a matrix-vector multiplication is all that is needed to implement the power method. Concerning you mark for the first eight exercise sheets, you can easily compute it from the following point range -> mark assigment: 28 - 30 points = 1.0, 26 - 27 points = 1.3, 24-25 points = 1.7, 22 - 23 points = 2.0, 20 - 21 points = 2.3, 18 - 19 points = 2.7, 16 - 17 points = 3.0, 14 - 15 points = 3.3, 12 - 13 points = 3.7, 10 - 11 points = 4.0. Your total number of points (with the worst exercise sheet taken out of the counting) is already on your page, I will add the corresponding marks now. Hannah 19Jan10 1:45am

Am I right with the assumption that we do not even need a matrix-matrix multiplication for exercise 4 but just a matrix-vector multiplication? And another question, where can I find the mark for the first 8 exercise sheets? Thanks. Florian 18.Jan10 22:38

Oh yes you're right. I forgot to transpose V. Sorry my mistake. Jonas 18.Jan10 21:49

Hi Jonas, no I think it's correct, V is n x k and then the transpose of V is k x n, and that is what appears in the product of the SVD. Hannah 18Jan10 21:39

One Question: On slide 11 of lecture 10, shouldn't V be a kxn matrix instead of a nxk? Jonas 18.Jan10 20:39

Hi Jens + all: we applied this rule to your exercise sheets until the christmas break, that is, we simply took your worst sheet out of the counting. Given that there are only 4, at most 5, exercise sheets after the christmas break, the rule does not apply for those sheets. Hannah 18Jan10 18:19

I have a question about skipping an exercise sheet: At the beginning of the semester you said that we would be allowed to do this once. Does it still hold? Jens 18Jan10 18:07

Oh yes, sorry, I forgot, it's done now. Hannah 17Jan10 13:43

Matthias - Can you please upload the Slides for the Lec 10 as well?

Here is a major hint for Exercise 2. There are many ways to prove this, but one of the most natural is via these three steps, each of which is not too hard to prove. Note that, unlike in the lecture, in the hint below I use X' to denote the transpose of a matrix or vector X. This is a common notation in the numerics community (where as the pure algebra people prefer the T superscript). Hannah 14Jan10 23:37

(1) Prove that for an m x m matrix U with U' * U = I and for an arbitrary m x 1 vector x, the L2-norms of x and U*x are equal. The L2-norm of a vector x is defined as the square root of the sum of the squares of its components. It helps to observe that the square of the L2-norm of x can also be written as x' * x.

(2) Prove that for an m x m matrix U with U' * U = I and for an arbitrary m x n matrix A, the L2-norms of A and of U * A are equal. This is easy to prove using (1).

(3) Prove that if A has the singular value decomposition U * S * V', then the L2-norm of A is the same as the L2-norm of S.

-  ⇤ ← Revision 291 as of 2009-11-16 23:44:23 → 
  Size: 11267
  Editor: dslb-188-098-089-088
  Comment:
+   ← Revision 589 as of 2010-01-21 17:51:25 → ⇥
  Size: 14221
  Editor: Hannah Bast
  Comment:
-Deletions are marked like this.
+Additions are marked like this.
 Line 3:
-Here are PDFs of the slides of the lectures so far: [[attachment:SearchEnginesWS0910/lecture-1.pdf|Lecture 1]], [[attachment:SearchEnginesWS0910/lecture-2.pdf|Lecture 2]], [[attachment:SearchEnginesWS0910/lecture-3.pdf|Lecture 3]], [[attachment:SearchEnginesWS0910/lecture-4.pdf|Lecture 4]].
+Here are PDFs of the slides of the lectures so far: [[attachment:SearchEnginesWS0910/lecture-1.pdf|Lecture 1]], [[attachment:SearchEnginesWS0910/lecture-2.pdf|Lecture 2]], [[attachment:SearchEnginesWS0910/lecture-3.pdf|Lecture 3]], [[attachment:SearchEnginesWS0910/lecture-4.pdf|Lecture 4]], [[attachment:SearchEnginesWS0910/lecture-5.pdf|Lecture 5]], [[attachment:SearchEnginesWS0910/lecture-6.pdf|Lecture 6]], [[attachment:SearchEnginesWS0910/lecture-7.pdf|Lecture 7]], [[attachment:SearchEnginesWS0910/lecture-8.pdf|Lecture 8]], [[attachment:SearchEnginesWS0910/lecture-9.pdf|Lecture 9]], [[attachment:SearchEnginesWS0910/lecture-10.pdf|Lecture 10]],[[attachment:SearchEnginesWS0910/lecture-11.pdf|Lecture 11]].
 Line 5:
-Here are .lpd files of the recordings of the lectures so far (except Lecture 2, where we had problems with the microphone): [[http://vulcano.informatik.uni-freiburg.de/lecturnity/lecture-1.lpd|Lecture 1]] [[http://vulcano.informatik.uni-freiburg.de/lecturnity/lecture-3.lpd|Lecture 3]] [[http://vulcano.informatik.uni-freiburg.de/lecturnity/lecture-4.lpd|Lecture 4]].
-Line 7:
+Line 6:
-Here are PDFs of the exercise sheets so far: [[attachment:SearchEnginesWS0910/exercise-1.pdf|Exercise Sheet 1]], [[attachment:SearchEnginesWS0910/exercise-2.pdf|Exercise Sheet 2]], [[attachment:SearchEnginesWS0910/exercise-3.pdf|Exercise Sheet 3]], [[attachment:SearchEnginesWS0910/exercise-4.pdf|Exercise Sheet 4]].
+Here are the recordings of the lectures so far (except Lecture 2, where we had problems with the microphone), LPD = Lecturnity recording: [[http://vulcano.informatik.uni-freiburg.de/lecturnity/lecture-1.lpd|Recording Lecture 1 (LPD)]], [[http://vulcano.informatik.uni-freiburg.de/lecturnity/lecture-3.lpd|Recording Lecture 3 (LPD)]], [[http://vulcano.informatik.uni-freiburg.de/lecturnity/lecture-4.lpd|Recording Lecture 4 (LPD)]], [[http://vulcano.informatik.uni-freiburg.de/lecturnity/lecture-5.lpd|Recording Lecture 5 (LPD without audio)]], [[http://vulcano.informatik.uni-freiburg.de/lecturnity/lecture-6.lpd|Recording Lecture 6 (LPD)]], [[http://vulcano.informatik.uni-freiburg.de/lecturnity/lecture-7.avi|Recording Lecture 7 (AVI)]], [[http://vulcano.informatik.uni-freiburg.de/lecturnity/lecture-8.avi|Recording Lecture 8 (AVI)]], [[http://vulcano.informatik.uni-freiburg.de/lecturnity/lecture-9.avi|Recording Lecture 9 (AVI)]], [[http://vulcano.informatik.uni-freiburg.de/lecturnity/lecture-10.avi|Recording Lecture 10 (AVI)]].
-Line 9:
+Line 8:
-Here are your solutions and comments on the previous exercise sheets: [[SearchEnginesWS0910/ExerciseSheet1|Solutions and Comments 1]], [[SearchEnginesWS0910/ExerciseSheet2|Solutions and Comments 2]], [[SearchEnginesWS0910/ExerciseSheet3|Solutions and Comments 3]]
+Here are PDFs of the exercise sheets so far: [[attachment:SearchEnginesWS0910/exercise-1.pdf|Exercise Sheet 1]], [[attachment:SearchEnginesWS0910/exercise-2.pdf|Exercise Sheet 2]], [[attachment:SearchEnginesWS0910/exercise-3.pdf|Exercise Sheet 3]], [[attachment:SearchEnginesWS0910/exercise-4.pdf|Exercise Sheet 4]], [[attachment:SearchEnginesWS0910/exercise-5.pdf|Exercise Sheet 5]], [[attachment:SearchEnginesWS0910/exercise-6.pdf|Exercise Sheet 6]], [[attachment:SearchEnginesWS0910/exercise-7.pdf|Exercise Sheet 7]], [[attachment:SearchEnginesWS0910/exercise-8.pdf|Exercise Sheet 8]], [[attachment:SearchEnginesWS0910/exercise-9.pdf|Exercise Sheet 9]], [[attachment:SearchEnginesWS0910/exercise-10.pdf|Exercise Sheet 10]], [[attachment:SearchEnginesWS0910/exercise-11.pdf|Exercise Sheet 11]].
-Line 11:
+Line 10:
-= Exercise Sheet 3 =
+Here are your solutions and comments on the previous exercise sheets: [[SearchEnginesWS0910/ExerciseSheet1|Solutions and Comments 1]], [[SearchEnginesWS0910/ExerciseSheet2|Solutions and Comments 2]], [[SearchEnginesWS0910/ExerciseSheet3|Solutions and Comments 3]], [[SearchEnginesWS0910/ExerciseSheet4|Solutions and Comments 4]], [[SearchEnginesWS0910/ExerciseSheet5|Solutions and Comments 5]], [[SearchEnginesWS0910/ExerciseSheet6|Solutions and Comments 6]], [[SearchEnginesWS0910/ExerciseSheet7|Solutions and Comments 7]], [[SearchEnginesWS0910/ExerciseSheet8|Solutions and Comments 8]], [[SearchEnginesWS0910/ExerciseSheet9|Solutions and Comments 9]].
-Line 13:
+Line 12:
-The recordings of all lectures are now available, see above. Lecture 2 is missing because we had technical problems there. To play the recordings (it's .lpd files) you need the Lecturnity Player. [[http://www.lecturnity.de/de/download/lecturnity-player|You can download the player for free here]].
+The recordings of all lectures are now available, see above. Lecture 2 is missing because we had technical problems there. To play the Lecturnity recordings (.lpd files) you need the [[http://www.lecturnity.de/de/download/lecturnity-player|Lecturnity Player, which you can download here]]. I put the Camtasia recordings as .avi files, which you can play with any ordinary video player; I would recommend [[http://www.videolan.org/vlc|VLC]].
-Line 17:
+Line 16:
-[[SearchEnginesWS0910/ExerciseSheet4|Here you can upload your solutions for Exercise Sheet 4]].
+[[SearchEnginesWS0910/MidTermExam|Here is everything about the mid-term exam]].
-Line 19:
+Line 18:
-== Questions or comments below this line, most recent on top please ==
I accidentally loaded the ZIP file instead of the PDF(My eyes are heavy...). But i couldn't overwrite the pdf file on the wiki now. So I loaded it in my_name_ex1.pdf. I hope this won't occur any problems. '''Jonas 16Nov09 11.41pm'''
+[[SearchEnginesWS0910/ExerciseSheet10|Here you can upload your solutions for Exercise Sheet 10]]. The deadline is Thursday 21Jan10 at 4 pm.
-Line 22:
+Line 20:
-To Florian: This was very well explained in the lecture (it is still there on the slides). It means the speed-up achieved by reading and decompressing your compressed list compared to when the list is read in uncompressed format. As your inverted list is randomly generated, you might have different speed-ups for different inverted lists. In order to have any speed-up, of course, your compression scheme should really work and reduce the size of the inverted list + should not be too inefficient. One extreme example for inefficient code would be using strings instead of bit-shifting for your coding. '''Marjan 8:38pm'''
+== Questions and comments about Exercise Sheet 10 below this line (most recent on top) ==
-Line 24:
+Line 22:
-What is meant with the best speedup for Exercise 4 which we should add on the solution page? The best speedup for just reading the data from disk or the best speedup for reading and decompressing the list? '''Florian 16Nov09 8:12pm'''
+Hi Manuela + all: I understand your point. I think that when one is familiar with basic linear algebra, then all the exercises (including Exercise 2, given my fairly strong and concrete hints) are something which you just sit down and do, no deep thinking required. But when one is not familiar, then yes, I can see that most of the time will be spend on understanding the meaning of basic things (which, I agree, is very important) like why can one write something like u * v', where u and v are vectors, and obtain a matrix. I guess I am constantly underestimating the mathematical background and exercise you received in you first semesters here in Freiburg. Anyway, I will take this into account when computing the marks from your points for the exercise sheets 9, 10, 11, etc. Note that also for the first 8 exercise sheets you could get a 1.0 without getting all the points, even after taking the worst sheet out of the counting. We will have something similar for the second half, too. So don't worry, it will be fair, and please continue to make an effort with the exercises, and continue to give me feedback when an exercise consumed way too much time, for whatever reason. '''Hannah 21Jan 17:48'''
-Line 26:
+Line 24:
-To Dragos: Gap encoding + Elias code is not trivial at all and you can use it. Gap encoding + Byte code is also fine. '''Marjan 16Nov09 2:15pm'''
+Maybe it's only a problem for me that I can't sit down and start to prove f.e. exercise 2 or 3 immediately. I'm not familiar with linear algebra and it's difficult to understand the meaning of what we do. So before I can start I have to search for information and have to read what matrix norms and Frobenius norms and so on is. That's why it took much time for me to do exercise 2 and 3. Proving the hints (at the bottom of this page) is also nothing what I can do in five minutes. And for exercise 1 it was my own fault that I need much more time for it. I was confused and made some silly stuff. Of course it would be nice to have the bonus points for the exam, but it will be hard (and time consuming) to solve all tasks of all exercise sheets without gaps. Thanks for the hints and I think that the new bonus point system is much better than the old one. The only thing is that I'm not sure, if the "time calculation" is better than before. Maybe I'm just too slow. '''Manuela'''
-Line 28:
+Line 26:
-Is it ok to use Elias code for the compressing from Exercise 4(2)? Or it's too 'trivial'?:)'''Dragos 16Nov09 14.14'''
+To Björn at all: Yes, I see, I think the solution to an exercise like Exercise 1 is much faster to write on paper and then scan it in. Typesetting lots of matrices etc. in Latex is no fun and takes lots of time and shouldn't really be part of an exercise. '''Hannah 21Jan10 14:32'''
-Line 30:
+Line 28:
-To Florian + all: A single number doesn't really make sense, does it? For the discussion part of the exercise, think in terms of probability distributions, as we did in the lecture (when discussing which probability distribution a certain encoding is optimal for). For the example, give a sequence of numbers. '''Hannah 15Nov09 9:41pm'''
+Yes, your last hint was very helpful. Thanks a lot. Sorry for the late response but I had to work for other courses first and it took me like 3 hours to put the other solutions into Latex (maybe this is also one reason why this sheet takes lots of time again. Especially Ex1 is okay to solve using applets/programs + copy&paste for all intermediate steps, but writing everything down, still takes ages). Now that I looked at exercise 2 again, your hint really helped. '''Björn 21Jan 13:03'''
-Line 32:
+Line 30:
-To Johannes + all: Yes, good idea. I will anyway at some point in the next weeks hand out a sheet where you have the opportunity to give feedback on the lectures and the exercises. But yes, why not give me that feedback on the current exercise sheet already now. Let me refine your proposal a bit. It would be useful for me if you would provide ''two'' grades: one for the hardness (pick one of: too hard for me, challenging but feasible, not very hard) and one for the amount of work (pick one of: too much for me, a lot but feasible, not more than for other lectures). It would also be helpful if you would not just give a grade but put your opinion into words. It's no problem if you are critical but please stay polite. I will take your comments seriously, don't worry. '''Hannah 15Nov09 9:33am'''
+It's there now. '''Hannah 21Jan10 00:59'''
-Line 34:
+Line 32:
-In Exercise we should "give an example of data for which k = x is the best choice". What is meant by "an example of data" here? A single number or a set of numbers or anything else? '''Florian 15Nov09 8:52pm'''
+The page to upload the solutions is still missing. '''Florian 21Jan10 00:02'''
-Line 36:
+Line 34:
-I'd like to suggest that everyone grades the exercise sheet from 1 (for "way to easy") to 10 ("way to hard"). This might provide the professor with the feedback she asks for in the lecture. How about that idea? '''Johannes 2009-11-15T20:40L'''
+To Florian: it's enough if you output the Eigenvector, but it would also be nice if you output an approximation of the eigenvalue. There is no unqiue procedure for doing this, however, and that's why I didn't include it formally in the exercise. The point is that what you will compute after a certain number of iterations is only ''almost'' an eigenvector, and how do you compute an approximate eigenvalue for something that is only almost an eigenvector. There are many ways to do it. One is to just take the quotient of each component of x_k and A*x_k, and then take the average of these components. Or the min and the max quotient and output both, that is probably more reasonable. '''Hannah 20Jan10 23:14'''
-Line 38:
+Line 36:
-To Florian + all: yes, sorry, I forgot to mention this in the lecture. Marjan already explained how to clear the disk cache. Let me add to this an explanation what the disk cache actually is. Whenever you read a (part of a) file from disk, the operating system of your computed will use whatever memory is currently unused to store that (part of the) file there. When you read it again and the (part of the) file hasn't changed and the memory used to store it has not been used otherwise in the meantime, than that data is read right from memory, which is much faster than reading it from disk. Usually that effect is desirable, because it speeds up things, but when you do experiments, it is undesirable, because it leads to unrealistically good running times, especially when carrying out an experiment many times in a row. '''Hannah 15Nov09 8:10pm'''
+Please give me details on which exercise cost you how much time and why. Sorry, I couldn't respond earlier, I was working from home, and the internet connection to the university has been extremely bad all day long. '''Hannah 20Jan10 23:08'''
-Line 40:
+Line 38:
-To Florian: Indeed, we were running out of time and there was no room for this in the lecture. I can suggest to you few ways how to clear the disk cache: before carrying out your final experiment, read a large amount of data (let's say close to the amount of RAM you have) from disk - this will ensure that your data (the inverted list) is cleared from the disk cache and replaced by something else (thus an actual reading from disk get's timed, and not reading from RAM). Another way is to restart your computer before doing the timing. '''Marjan 15Nov09 7:27pm'''
+I think that I also can't complete the sheet this time (especially not in 6 hours) and it's a pity that not the worst sheet of all doesn't count. I know that some of us now have to priorize because in the beginning of February (and later) some exams will be written and also there will be seminars. That means that not the new bonus point system would be the reason for stopping after a certain time of solving tasks, but the general situation. '''Manuela 20Jan10 21:06'''
-Line 42:
+Line 40:
-In exercise 4 it says: "Important note: Whenver you measure running times for reading data from disk, you have to clear the disk cache before, as discussed in the lecture". I think that this was not discussed in the lecture? What do we have to do here? '''Florian 15Nov09 7:15pm'''
+I used up my 6 hours for the exercise sheet and I couldn't complete it. What is the proceeding in this case? '''Johannes 2763-01-20T1934'''
-Line 44:
+Line 42:
-@Bit shifting: The syntax for that is actually the same, irrespectively of whether you use Java, C++, perl, python, or whatever. The >> operator shifts to the right, the << operator shifts to the left, the & operator ands the bits of the two operands and the | operator ors the bits of the two operands. Very simple. You will also find zillions of example programs on the web by typing something like ''java bit shifting'' into Google or whatever your favorite search engine is. '''Hannah 15Nov09 1:16'''
+Thanks for the additional hint for exercise 2, I think I finally managed to get a solution with it. Then I have another question to exercise 4: is it enough if the program outputs the eigenvector or should it also give the corresponding maximal eigenvalue? '''Florian 20Jan10 18:25'''
-Line 46:
+Line 44:
-Hi Marius + all: For Exercise 4, an inverted list of size m with doc ids from the range [1..n] is simply a sorted list of m numbers from the range [1..n]. I leave it to you, whether your lists potentially contain duplicates (as in  3, 5, 5, 8, 12, ...) or whether you generate them in a way that they don't contain duplicates (as in 3, 5, 8, 17, ...). It doesn't really matter for the exercise whether your list has duplicated or not. In any case, consider simple flat lists like in the two examples I gave (and like all the examples I gave in this and past lectures), not lists of lists or anything. '''Hannah 15Nov09 1:12am'''
+Hi Mirko: no, I meant what I wrote. You can easily check that it makes sense from the matrix / vector dimensions: u_i is an m x 1 vector, s_i is an 1 x 1 scalar, v_i is an n x 1 vector and hence v_i' (the transpose of v_i) is a 1 x n vector. Hence wrt to matrix dimensions u_i * s_i * v_i' is m x 1 * 1 x 1 * 1 x n, which matches as it should and gives an m x n matrix. Ok? '''Hannah 19Jan10 23:32'''
-Line 48:
+Line 46:
-@Mirko: Sure, but an inverted list is a list of words where the Doc-IDs are attached to each words in which the words occur. So for Example: If word no. 5 occurs in Doc1, Doc2 and Doc3 and word no. 2 occurs in Doc5, the list would look like: 5 -> Doc1, Doc2, Doc3; 2 -> Doc5. Or am I mistaken? My question then is, how long should these attached lists be in average case? I mean, one could imagine that we got 1mil. documents over 3 words, so these lists could get very large...
+About the last post: with A = sum{i=1, .... you mean ||A|| = sum{i=1, ..., so the frobenius-norm of A? and by v_i' you mean the row of i of V'? If so, i don't get ||A|| = sum{i=1,...,r} u_i * s_i * v_i'. Or am i totally wrong? I'm just confused. '''Mirko, 19.01, 22:18'''
-Line 50:
+Line 48:
-EDIT: Oh ok. Now, I see your point. It's not an index, it's a list. Okay. So, what is an inverted list with Doc-IDs, then?
+Hi Björn + all: let the SVD of A be U * S * V' and let u_1,...,u_r be the r columns of U, and let v_1,...,v_r be the r columns of V (or, equivalently, the r rows of V'), where r is the rank of the matrix. Let s_1,...,s_r be the r diagnonal entries of S. Then convince yourself that you can write A = sum_{i=1,...,r} u_i * s_i * v_i'. With that, you easily get the SVD of A_k and also of A - A_k. Tell me if this helped you. '''Hannah 19Jan10 17:32'''
-Line 52:
+Line 50:
-EDIT EDIT: And to your question, Mirko, take a look at http://snippets.dzone.com/posts/show/93. Especially at Comment no. 2. Maybe this helps... I think, Java supports StreamWriters/Readers that are able to write/read bytes. '''Marius 11/14/2009 08:46pm'''
+Hey, is it possible that for exercise 2 it's ||A||-||A_k|| instead of ||A-A_k||? We've just discussed this because we both were stuck and the later seems more obvious to us and provable with the hints provided. Of course, it is possible that we're mistaken somewhere. We managed to proove the lemmas from the hint but still fail to proove the statement from the exercise. Otherwise, we might need a hint to show that ||A-A_k|| = ||S-S_k|| while S_k is S with the values s_ij where i,j>k set to 0. '''Bjoern 19.1. 17:18'''
-Line 54:
+Line 52:
-EDIT EDIT EDIT: Sorry, me again. Well, I bothered Wikipedia which redirects from http://en.wikipedia.org/wiki/Inverted_list to Inverted Index. So it seems to me, this is being used as a synonym. Actually, I think I'm confused enough, now. I'll better wait for any responses... ;-) '''Marius 11/14/2009 9:08 pm'''
+Ok, I added the marks to your individual pages now. If a mark does not correspond to the assignment scheme in my post below, please tell us. '''Hannah 19Jan10 1:56am'''
-Line 56:
+Line 54:
-@ Marius: i think we are supposed to generate one inverted __list__ of size m, with doc ids from 1..n (therefore n>=m, because no duplicates?).
+Hi Florian + all: yes, you are right, a matrix-vector multiplication is all that is needed to implement the power method. Concerning you mark for the first eight exercise sheets, you can easily compute it from the following point range -> mark assigment: 28 - 30 points = 1.0, 26 - 27 points = 1.3, 24-25 points = 1.7, 22 - 23 points = 2.0, 20 - 21 points = 2.3, 18 - 19 points = 2.7, 16 - 17 points = 3.0, 14 - 15 points = 3.3, 12 - 13 points = 3.7, 10 - 11 points = 4.0. Your total number of points (with the worst exercise sheet taken out of the counting) is already on your page, I will add the corresponding marks now. '''Hannah 19Jan10 1:45am'''
-Line 58:
+Line 56:
-Now a question from my side: ex.4, programming the compression in __java__, is there any __good__ tutorial about how to handle the bit-stuff? (otherwise, i think, it would cost me too much time..) '''Mirko 14Nov09, 19:18'''
+Am I right with the assumption that we do not even need a matrix-matrix multiplication for exercise 4 but just a matrix-vector multiplication? And another question, where can I find the mark for the first 8 exercise sheets? Thanks. '''Florian 18.Jan10 22:38'''
-Line 60:
+Line 58:
-Hi, do you have any suggestions what the best numbers for m and n in exercise 4 should look like? Or are we supposed to mess around a bit with ints and longs? And: How long should the list of documents in the inverted index be? '''Marius 14Nov09 6:40pm'''
+Oh yes you're right. I forgot to transpose V. Sorry my mistake. '''Jonas 18.Jan10 21:49'''
-Line 62:
+Line 60:
-And just to clarify what a single-cycle permutation is. Here is an example for an array of size 5 with a permutation that is a single cycle: 5 4 1 3 2. Why single cycle? Well, A[1] = 5, A[5] = 2, A[2] = 4, A[4] = 3, A[3] = 1. (My indices in this example are 1,...,5 and not 0,...,4.) Here is an example of a permutation with three cycles: 2 1 4 3 5. The first cycle is A[1] = 2, A[2] =1. The second cycle is A[3] = 4, A[4] = 3. The third cycle is A[5] = 5. '''Hannah 12Nov09 8:04pm'''
+Hi Jonas, no I think it's correct, V is n x k and then the transpose of V is k x n, and that is what appears in the product of the SVD. '''Hannah 18Jan10 21:39'''
-Line 64:
+Line 62:
-Hi Daniel + all, I don't quite understand your question and your example (if your array is 1 5 3 4 2, why is A[1] = 3?). In case you refer to the requirement of the exercise that the permutation consists only of a single cycle. That is because your code should go over each element exactly once (it should, of course, stop after n iterations, where n is the size of the array). If your permutation has more than one cycle, it is hard to achieve that. Also note that for both (1) and (2), the sum of the array values should be sum_i=1,...,n i = n * (n+1) / 2. '''Hannah 12Nov09 7:54pm'''
+One Question: On slide 11 of lecture 10, shouldn't V be a kxn matrix instead of a nxk? '''Jonas 18.Jan10 20:39'''
-Line 66:
+Line 64:
-Hi, I just looked at the new exercise sheet 4, in exercise 1 we should generate a permutation and sum the resulting array up, am I wrong or doesn't iterating method two iterate throw the whole array in every situation. for ex.: n= 5 permutation: 1 5 3 4 2, then A[1] = 3, A[A[1]]= A[3] = 1, A[1] = 3 ... '''Daniel 12Nov09 19:44pm'''
+Hi Jens + all: we applied this rule to your exercise sheets until the christmas break, that is, we simply took your worst sheet out of the counting. Given that there are only 4, at most 5, exercise sheets after the christmas break, the rule does not apply for those sheets. '''Hannah 18Jan10 18:19'''

I have a question about skipping an exercise sheet: At the beginning of the semester you said that we would be allowed to do this once. Does it still hold? '''Jens 18Jan10 18:07'''

Oh yes, sorry, I forgot, it's done now. '''Hannah 17Jan10 13:43'''

'''Matthias''' - Can you please upload the Slides for the Lec 10 as well?

Here is a major hint for Exercise 2. There are many ways to prove this, but one of the most natural is via these three steps, each of which is not too hard to prove. Note that, unlike in the lecture, in the hint below I use X' to denote the transpose of a matrix or vector X. This is a common notation in the numerics community (where as the pure algebra people prefer the T superscript). '''Hannah 14Jan10 23:37'''

{{{
(1) Prove that for an m x m matrix U with U' * U = I and for an arbitrary m x 1 vector x, the L2-norms of x and U*x are equal. The L2-norm of a vector x is defined as the square root of the sum of the squares of its components. It helps to observe that the square of the L2-norm of x can also be written as x' * x.

(2) Prove that for an m x m matrix U with U' * U = I and for an arbitrary m x n matrix A, the L2-norms of A and of U * A are equal. This is easy to prove using (1).

(3) Prove that if A has the singular value decomposition U * S * V', then the L2-norm of A is the same as the L2-norm of S.
}}}