6232
Comment:
|
7576
|
Deletions are marked like this. | Additions are marked like this. |
Line 3: | Line 3: |
Here are PDFs of the slides of the lectures so far: [[attachment:SearchEnginesWS0910/lecture-1.pdf|Lecture 1]], [[attachment:SearchEnginesWS0910/lecture-2.pdf|Lecture 2]], [[attachment:SearchEnginesWS0910/lecture-3.pdf|Lecture 3]], [[attachment:SearchEnginesWS0910/lecture-4.pdf|Lecture 4]], [[attachment:SearchEnginesWS0910/lecture-5.pdf|Lecture 5]], [[attachment:SearchEnginesWS0910/lecture-6.pdf|Lecture 6]]. | Here are PDFs of the slides of the lectures so far: [[attachment:SearchEnginesWS0910/lecture-1.pdf|Lecture 1]], [[attachment:SearchEnginesWS0910/lecture-2.pdf|Lecture 2]], [[attachment:SearchEnginesWS0910/lecture-3.pdf|Lecture 3]], [[attachment:SearchEnginesWS0910/lecture-4.pdf|Lecture 4]], [[attachment:SearchEnginesWS0910/lecture-5.pdf|Lecture 5]], [[attachment:SearchEnginesWS0910/lecture-6.pdf|Lecture 6]], [[attachment:SearchEnginesWS0910/lecture-7.pdf|Lecture 7]], [[attachment:SearchEnginesWS0910/lecture-8.pdf|Lecture 8]], [[attachment:SearchEnginesWS0910/lecture-9.pdf|Lecture 9]], [[attachment:SearchEnginesWS0910/lecture-10.pdf|Lecture 10]], [[attachment:SearchEnginesWS0910/lecture-11.pdf|Lecture 11]], [[attachment:SearchEnginesWS0910/lecture-12.pdf|Lecture 12]], [[attachment:SearchEnginesWS0910/lecture-13.pdf|Lecture 13]]. |
Line 5: | Line 5: |
Here are .lpd files of the recordings of the lectures so far (except Lecture 2, where we had problems with the microphone): [[http://vulcano.informatik.uni-freiburg.de/lecturnity/lecture-1.lpd|Recording Lecture 1]], [[http://vulcano.informatik.uni-freiburg.de/lecturnity/lecture-3.lpd|Recording Lecture 3]], [[http://vulcano.informatik.uni-freiburg.de/lecturnity/lecture-4.lpd|Recording Lecture 4]], [[http://vulcano.informatik.uni-freiburg.de/lecturnity/lecture-5.lpd|Recording Lecture 5 (no audio)]], [[http://vulcano.informatik.uni-freiburg.de/lecturnity/lecture-6.lpd|Recording Lecture 6 (with audio for a change)]]. | Here are the recordings of the lectures so far (except Lecture 2, where we had problems with the microphone), LPD = Lecturnity recording: [[http://vulcano.informatik.uni-freiburg.de/lecturnity/lecture-1.lpd|Recording Lecture 1 (LPD)]], [[http://vulcano.informatik.uni-freiburg.de/lecturnity/lecture-3.lpd|Recording Lecture 3 (LPD)]], [[http://vulcano.informatik.uni-freiburg.de/lecturnity/lecture-4.lpd|Recording Lecture 4 (LPD)]], [[http://vulcano.informatik.uni-freiburg.de/lecturnity/lecture-5.lpd|Recording Lecture 5 (LPD without audio)]], [[http://vulcano.informatik.uni-freiburg.de/lecturnity/lecture-6.lpd|Recording Lecture 6 (LPD)]], [[http://vulcano.informatik.uni-freiburg.de/lecturnity/lecture-7.avi|Recording Lecture 7 (AVI)]], [[http://vulcano.informatik.uni-freiburg.de/lecturnity/lecture-8.avi|Recording Lecture 8 (AVI)]], [[http://vulcano.informatik.uni-freiburg.de/lecturnity/lecture-9.avi|Recording Lecture 9 (AVI)]], [[http://vulcano.informatik.uni-freiburg.de/lecturnity/lecture-10.avi|Recording Lecture 10 (AVI)]], [[http://vulcano.informatik.uni-freiburg.de/lecturnity/lecture-11.avi|Recording Lecture 11 (AVI)]], [[http://vulcano.informatik.uni-freiburg.de/lecturnity/lecture-12.avi|Recording Lecture 12 (AVI)]], [[http://vulcano.informatik.uni-freiburg.de/lecturnity/lecture-13.avi|Recording Lecture 13 (AVI)]]. To play the Lecturnity recordings (.lpd files) you need the [[http://www.lecturnity.de/de/download/lecturnity-player|Lecturnity Player, which you can download here]]. I put the Camtasia recordings as .avi files, which you can play with any ordinary video player; I would recommend [[http://www.videolan.org/vlc|VLC]]. |
Line 7: | Line 7: |
Here are PDFs of the exercise sheets so far: [[attachment:SearchEnginesWS0910/exercise-1.pdf|Exercise Sheet 1]], [[attachment:SearchEnginesWS0910/exercise-2.pdf|Exercise Sheet 2]], [[attachment:SearchEnginesWS0910/exercise-3.pdf|Exercise Sheet 3]], [[attachment:SearchEnginesWS0910/exercise-4.pdf|Exercise Sheet 4]], [[attachment:SearchEnginesWS0910/exercise-5.pdf|Exercise Sheet 5]], [[attachment:SearchEnginesWS0910/exercise-6.pdf|Exercise Sheet 6]]. | Here are PDFs of the exercise sheets so far: [[attachment:SearchEnginesWS0910/exercise-1.pdf|Exercise Sheet 1]], [[attachment:SearchEnginesWS0910/exercise-2.pdf|Exercise Sheet 2]], [[attachment:SearchEnginesWS0910/exercise-3.pdf|Exercise Sheet 3]], [[attachment:SearchEnginesWS0910/exercise-4.pdf|Exercise Sheet 4]], [[attachment:SearchEnginesWS0910/exercise-5.pdf|Exercise Sheet 5]], [[attachment:SearchEnginesWS0910/exercise-6.pdf|Exercise Sheet 6]], [[attachment:SearchEnginesWS0910/exercise-7.pdf|Exercise Sheet 7]], [[attachment:SearchEnginesWS0910/exercise-8.pdf|Exercise Sheet 8]], [[attachment:SearchEnginesWS0910/exercise-9.pdf|Exercise Sheet 9]], [[attachment:SearchEnginesWS0910/exercise-10.pdf|Exercise Sheet 10]], [[attachment:SearchEnginesWS0910/exercise-11.pdf|Exercise Sheet 11]], [[attachment:SearchEnginesWS0910/exercise-12.pdf|Exercise Sheet 12]], [[attachment:SearchEnginesWS0910/exercise-13.pdf|Exercise Sheet 13]]. |
Line 9: | Line 9: |
Here are your solutions and comments on the previous exercise sheets: [[SearchEnginesWS0910/ExerciseSheet1|Solutions and Comments 1]], [[SearchEnginesWS0910/ExerciseSheet2|Solutions and Comments 2]], [[SearchEnginesWS0910/ExerciseSheet3|Solutions and Comments 3]], [[SearchEnginesWS0910/ExerciseSheet4|Solutions and Comments 4]], [[SearchEnginesWS0910/ExerciseSheet5|Solutions and Comments 5]]. | Here are your solutions and comments on the previous exercise sheets: [[SearchEnginesWS0910/ExerciseSheet1|Solutions and Comments 1]], [[SearchEnginesWS0910/ExerciseSheet2|Solutions and Comments 2]], [[SearchEnginesWS0910/ExerciseSheet3|Solutions and Comments 3]], [[SearchEnginesWS0910/ExerciseSheet4|Solutions and Comments 4]], [[SearchEnginesWS0910/ExerciseSheet5|Solutions and Comments 5]], [[SearchEnginesWS0910/ExerciseSheet6|Solutions and Comments 6]], [[SearchEnginesWS0910/ExerciseSheet7|Solutions and Comments 7]], [[SearchEnginesWS0910/ExerciseSheet8|Solutions and Comments 8]], [[SearchEnginesWS0910/ExerciseSheet9|Solutions and Comments 9]], [[SearchEnginesWS0910/ExerciseSheet10|Solutions and Comments 10]], [[SearchEnginesWS0910/ExerciseSheet11|Solutions and Comments 11]], [[SearchEnginesWS0910/ExerciseSheet12|Solutions and Comments 12]]. |
Line 11: | Line 11: |
= Exercise Sheet 6 = The recordings of all lectures are now available, see above. Lecture 2 is missing because we had technical problems there. To play the recordings (it's .lpd files) you need the Lecturnity Player. [[http://www.lecturnity.de/de/download/lecturnity-player|You can download the player for free here]]. |
Here are our master solutions: [[attachment:SearchEnginesWS0910/solution-midterm.pdf|Master solution for Mid-Term Exam]],[[attachment:SearchEnginesWS0910/solution-9.pdf|Master solution for Exercise Sheet 9]], [[attachment:SearchEnginesWS0910/solution-10.pdf|Master solution for Exercise Sheet 10]]. |
Line 16: | Line 15: |
[[SearchEnginesWS0910/ExerciseSheet6|Here you can upload your solutions for Exercise Sheet 6]]. | [[SearchEnginesWS0910/MidTermExam|Here is everything about the mid-term exam]]. The final exam is on Friday March 12, 2010. The written exam begins at 2.00 pm in HS 026. The oral exams are scheduled on the same day. |
Line 18: | Line 17: |
== Questions or comments below this line, most recent on top please == | [[SearchEnginesWS0910/ExerciseSheet13|Here is the table with the links to your uploaded solutions for Exercise Sheet 13]]. The deadline is Thursday 11Feb10 16:00. |
Line 20: | Line 19: |
In Exercise 2, you ask for the costs of scanning the inv. lists of all words in the "collection". Do you mean the collection of words, matching the prefix or the the whole collection with all words in the inv. index? '''Claudius 30Nov09 2:16pm''' | == Questions and comments about Exercise Sheet 13 below this line (most recent on top) == |
Line 22: | Line 21: |
Hi Dragos, just three-letter prefixes are fine. I have no plans yet for future exercises with a "*" in the middle. '''Hannah 29Nov09 11:10pm''' | Hi, in exercise 1, is it really possible to implement the complete-link heuristic with a priority queue so that the finding of the next two clusters which have to be merged only needs logarithmic time? If I did it right, then we have to fill the priority queue with the possible pairs of clusters and use their similarity with the complete-link heuristic to sort the queue. But when two clusters are merged now, then the distances of the combined cluster (i.e. the one with the smaller index) to the other clusters can change, so after each merge the priority queue has to be rebuilt to ensure it's heap property. But a complete rebuild of the queue is quite time consuming, or am I doing anything wrong? '''Florian 7Feb10 20:09''' |
Line 24: | Line 23: |
For exercise 1, should we allow the "*" to be in any place ? Or just three letter prefix is sufficient ? I am asking because it would be good to know if we might need on later Exercise Sheets searches that allow multiple "*" in different positions, so that we do it now. '''Dragos 29 Nov 22:55''' | Hi Mirko, you are free to modify the existing program (linked to in my first comment below), or write the program from scratch (or transcribe my C++ program) in a programming language of your choice. '''Hannah 7Feb10 16:40''' |
Line 26: | Line 25: |
Hi Björn, by ratio I simply mean the quotient, that is, how much bigger the one is then the other. For example, if, for a particular prefix, the total size from (1) is one million, and the size from (2) is ten thousand, then, for that prefix, the ratio between the two is one hundred. '''Hannah 29Nov09 7:48pm''' | Hi, one question about Exercise 1: the text says "Modify the code", does this mean we have to extend the C++-program, or are we also allowed to write the complete program in an other language like Java? '''Mirko 7Feb10 16:33''' |
Line 28: | Line 27: |
Hello, I wonder what's meant with the ratio demanded in exercise 1. If i have n lists with a maximum length of "a" and a total length of "b". Isn't the ratio something like "a:b"? At least that is what I thought. But adding a colon does not seem to be sufficient for a part of the exercise. Sorry for the meaningless question but I don't want to miss points because I'm not sure how to understand the word ratio. '''Björn 11-29 19:39''' | I can't say now with certainty which courses exactly I will offer in a year. It is likely though, that we will offer bachelor / master projects on a regular basis. '''Hannah 5Feb10 21:17''' |
Line 30: | Line 29: |
To all: about the selection of the ten prefixes. The idea was that you pick a meaningful variety by hand, that is, such prefixes which one could imagine that one would really type them. The exact selection doesn't really matter, but do avoid extreme cases like a prefix ''yzq'' with one completion and an inverted list of three doc ids. '''Hannah 29Nov09 6:28pm''' | Yes, Marjan is right. '''Johannes 2010-02-05:1931''' |
Line 32: | Line 31: |
To Marius + all: yes, I am sorry, "cost" was very imprecise here, I actually simply meant the time your code takes. '''Hannah 29Nov09 6:19pm''' | Probably Johannes wants/has to do his project next year. '''Marjan 5Feb10 18:32''' |
Line 34: | Line 33: |
So you say that you mean by "costs" the running time? Or do you understand something else when you say we have to calculate the costs? '''Marius 11/29/09 4:58pm''' | Hi Johannes, I don't quite understand your comment, what do you mean? '''Hannah 5Feb10 17:33''' |
Line 36: | Line 35: |
Notice about Problem 2: You should use precise timers when measuring the running times. If your collection is very small and you round up your times, it's easy to get 0 ms when merging or scanning the inverted lists. I recommend using microsecond scale. '''Marjan 29Nov09 16:47''' | Even if it was in one year? '''Johannes 2010-02-05:1719''' |
Line 38: | Line 37: |
To Florian: Yes, you can do anything you want to find those words (as long as you produce the required outputs). '''Marjan 29Nov09 16:44''' | If you are interested in doing the Bachelor / Master's project (following this course as a block course), please send me an email asap: bast@informatik.uni-freiburg.de . '''Hannah 4Deb10 19:30''' |
Line 40: | Line 39: |
For exercise 1, should we use one of the methods presented in the lecture to find all words in the collection with the prefixes or can we do just anything to get them (though it might not be as efficient)? '''Florian 29Nov09 03:51pm''' When you scan, please make sure that you do something very simple with the elements, like summing up all doc ids, and then outputting that sum. Otherwise a clever compiler might figure out that it can remove the whole loop, because it is not producing a result that is used anywhere. '''Hannah 28Nov09 11:48pm''' To Mirko: Yes, scanning means one pass over the elements. '''Marjan 28Nov09 19:19''' Hi, about exercise2: is by "scanning" meant that one looks at every element exactly once? (=> costs of scanning a list are just the size of the list) '''Mirko 28Nov, 19:12''' |
Here is the [[attachment:HierarchicalClustering.cpp|hierarchical clustering code]] which we have written in the lecture. '''Hannah 2Feb10 18:47''' |
Welcome to the Wiki page of the course Search Engines, WS 2009 / 2010. Lecturer: Hannah Bast. Tutorials: Marjan Celikik. Course web page: click here.
Here are PDFs of the slides of the lectures so far: Lecture 1, Lecture 2, Lecture 3, Lecture 4, Lecture 5, Lecture 6, Lecture 7, Lecture 8, Lecture 9, Lecture 10, Lecture 11, Lecture 12, Lecture 13.
Here are the recordings of the lectures so far (except Lecture 2, where we had problems with the microphone), LPD = Lecturnity recording: Recording Lecture 1 (LPD), Recording Lecture 3 (LPD), Recording Lecture 4 (LPD), Recording Lecture 5 (LPD without audio), Recording Lecture 6 (LPD), Recording Lecture 7 (AVI), Recording Lecture 8 (AVI), Recording Lecture 9 (AVI), Recording Lecture 10 (AVI), Recording Lecture 11 (AVI), Recording Lecture 12 (AVI), Recording Lecture 13 (AVI). To play the Lecturnity recordings (.lpd files) you need the Lecturnity Player, which you can download here. I put the Camtasia recordings as .avi files, which you can play with any ordinary video player; I would recommend VLC.
Here are PDFs of the exercise sheets so far: Exercise Sheet 1, Exercise Sheet 2, Exercise Sheet 3, Exercise Sheet 4, Exercise Sheet 5, Exercise Sheet 6, Exercise Sheet 7, Exercise Sheet 8, Exercise Sheet 9, Exercise Sheet 10, Exercise Sheet 11, Exercise Sheet 12, Exercise Sheet 13.
Here are your solutions and comments on the previous exercise sheets: Solutions and Comments 1, Solutions and Comments 2, Solutions and Comments 3, Solutions and Comments 4, Solutions and Comments 5, Solutions and Comments 6, Solutions and Comments 7, Solutions and Comments 8, Solutions and Comments 9, Solutions and Comments 10, Solutions and Comments 11, Solutions and Comments 12.
Here are our master solutions: Master solution for Mid-Term Exam,Master solution for Exercise Sheet 9, Master solution for Exercise Sheet 10.
Here are the rules for the exercises as explained in Lecture 2.
Here is everything about the mid-term exam. The final exam is on Friday March 12, 2010. The written exam begins at 2.00 pm in HS 026. The oral exams are scheduled on the same day.
Here is the table with the links to your uploaded solutions for Exercise Sheet 13. The deadline is Thursday 11Feb10 16:00.
Questions and comments about Exercise Sheet 13 below this line (most recent on top)
Hi, in exercise 1, is it really possible to implement the complete-link heuristic with a priority queue so that the finding of the next two clusters which have to be merged only needs logarithmic time? If I did it right, then we have to fill the priority queue with the possible pairs of clusters and use their similarity with the complete-link heuristic to sort the queue. But when two clusters are merged now, then the distances of the combined cluster (i.e. the one with the smaller index) to the other clusters can change, so after each merge the priority queue has to be rebuilt to ensure it's heap property. But a complete rebuild of the queue is quite time consuming, or am I doing anything wrong? Florian 7Feb10 20:09
Hi Mirko, you are free to modify the existing program (linked to in my first comment below), or write the program from scratch (or transcribe my C++ program) in a programming language of your choice. Hannah 7Feb10 16:40
Hi, one question about Exercise 1: the text says "Modify the code", does this mean we have to extend the C++-program, or are we also allowed to write the complete program in an other language like Java? Mirko 7Feb10 16:33
I can't say now with certainty which courses exactly I will offer in a year. It is likely though, that we will offer bachelor / master projects on a regular basis. Hannah 5Feb10 21:17
Yes, Marjan is right. Johannes 2010-02-05:1931
Probably Johannes wants/has to do his project next year. Marjan 5Feb10 18:32
Hi Johannes, I don't quite understand your comment, what do you mean? Hannah 5Feb10 17:33
Even if it was in one year? Johannes 2010-02-05:1719
If you are interested in doing the Bachelor / Master's project (following this course as a block course), please send me an email asap: bast@informatik.uni-freiburg.de . Hannah 4Deb10 19:30
Here is the hierarchical clustering code which we have written in the lecture. Hannah 2Feb10 18:47