4113
Comment:
|
7114
|
Deletions are marked like this. | Additions are marked like this. |
Line 3: | Line 3: |
Here are PDFs of the slides of the lectures so far: [[attachment:SearchEnginesWS0910/lecture-1.pdf|Lecture 1]], [[attachment:SearchEnginesWS0910/lecture-2.pdf|Lecture 2]], [[attachment:SearchEnginesWS0910/lecture-3.pdf|Lecture 3]]. | Here are PDFs of the slides of the lectures so far: [[attachment:SearchEnginesWS0910/lecture-1.pdf|Lecture 1]], [[attachment:SearchEnginesWS0910/lecture-2.pdf|Lecture 2]], [[attachment:SearchEnginesWS0910/lecture-3.pdf|Lecture 3]], [[attachment:SearchEnginesWS0910/lecture-4.pdf|Lecture 4]], [[attachment:SearchEnginesWS0910/lecture-5.pdf|Lecture 5]], [[attachment:SearchEnginesWS0910/lecture-6.pdf|Lecture 6]]. |
Line 5: | Line 5: |
Here are the recordings of some of the lectures so far (Lecture 1 still missing, in Lecture 2 the microphone signal did not come through): [[http://vulcano.informatik.uni-freiburg.de/lecturnity/lecture1/Search_Engines,_Lecture_3,_5Nov09_1_05_11_2009_16_16_20.html|Lecture 3]] | Here are .lpd files of the recordings of the lectures so far (except Lecture 2, where we had problems with the microphone): [[http://vulcano.informatik.uni-freiburg.de/lecturnity/lecture-1.lpd|Recording Lecture 1]], [[http://vulcano.informatik.uni-freiburg.de/lecturnity/lecture-3.lpd|Recording Lecture 3]], [[http://vulcano.informatik.uni-freiburg.de/lecturnity/lecture-4.lpd|Recording Lecture 4]], [[http://vulcano.informatik.uni-freiburg.de/lecturnity/lecture-5.lpd|Recording Lecture 5 (no audio)]], [[http://vulcano.informatik.uni-freiburg.de/lecturnity/lecture-6.lpd|Recording Lecture 6 (with audio for a change)]]. |
Line 7: | Line 7: |
Here are PDFs of the exercise sheets so far: [[attachment:SearchEnginesWS0910/exercise-1.pdf|Exercise Sheet 1]], [[attachment:SearchEnginesWS0910/exercise-2.pdf|Exercise Sheet 2]], [[attachment:SearchEnginesWS0910/exercise-3.pdf|Exercise Sheet 3]]. | Here are PDFs of the exercise sheets so far: [[attachment:SearchEnginesWS0910/exercise-1.pdf|Exercise Sheet 1]], [[attachment:SearchEnginesWS0910/exercise-2.pdf|Exercise Sheet 2]], [[attachment:SearchEnginesWS0910/exercise-3.pdf|Exercise Sheet 3]], [[attachment:SearchEnginesWS0910/exercise-4.pdf|Exercise Sheet 4]], [[attachment:SearchEnginesWS0910/exercise-5.pdf|Exercise Sheet 5]], [[attachment:SearchEnginesWS0910/exercise-6.pdf|Exercise Sheet 6]]. |
Line 9: | Line 9: |
Here are your solutions and comments on the previous exercise sheets: [[SearchEnginesWS0910/ExerciseSheet1|Exercise Sheet 1]], [[SearchEnginesWS0910/ExerciseSheet2|Exercise Sheet 2]]. | Here are your solutions and comments on the previous exercise sheets: [[SearchEnginesWS0910/ExerciseSheet1|Solutions and Comments 1]], [[SearchEnginesWS0910/ExerciseSheet2|Solutions and Comments 2]], [[SearchEnginesWS0910/ExerciseSheet3|Solutions and Comments 3]], [[SearchEnginesWS0910/ExerciseSheet4|Solutions and Comments 4]], [[SearchEnginesWS0910/ExerciseSheet5|Solutions and Comments 5]]. = Exercise Sheet 6 = The recordings of all lectures are now available, see above. Lecture 2 is missing because we had technical problems there. To play the recordings (it's .lpd files) you need the Lecturnity Player. [[http://www.lecturnity.de/de/download/lecturnity-player|You can download the player for free here]]. |
Line 13: | Line 16: |
= Exercise Sheet 3 = [[SearchEnginesWS0910/ExerciseSheet3|Here you can upload your solutions for Exercise Sheet 3]]. |
[[SearchEnginesWS0910/ExerciseSheet6|Here you can upload your solutions for Exercise Sheet 6]]. |
Line 19: | Line 20: |
To Mirko + all: whenever we write "prove", we mean a proof in the mathematical sense. For the exercises, the challenge is often two-fold. You first have to turn the statement of the exercise into a formal statement. Then you have to prove that statement. For Exercise 4 you will first have to specify the order in which the inverted lists should be sorted. Then you have to prove that the document with the i-th largest score (formed by max aggregation), where i <= k, is indeed among one of the k first entries wrt to the specified order, in at least one of the inverted lists. '''Hannah 3Nov09 10:29pm''' | Hi Björn + all, it doesn't really matter, but I (and probably most humans) find ratios > 1 more intuitive. Just compare 8 and 0.125, which one is more catchy. '''Hannah 30Nov09 11:59pm''' |
Line 21: | Line 22: |
About Exercise4: I actually dont know how to to write down (but i think i know how/why it works) the prove of top-k retrieval with the maximum-score. Is it okay to describe it in words or do we have to formalize it in a certain way? '''Mirko 5Nov09 22:21pm''' | Does it matter which way round we express the ratios? Depending on how we build the quotient, we get different values (all smaller or all greater 1). Or is that up to us? Should be possible to compare our results anyway, I assume. '''Björn 30Nov 23:36''' |
Line 23: | Line 24: |
Ok, I have played around a bit with lecturnity myself, and published Lecture 3, see the link above. For Marjan it worked, he only needed to install some Windows Media plugin for his Firefox. Please also try, and tell me if there are problems. Also tell me if everything goes fine. (It's enough if one or two people tell me.) If it does I will also publish Lecture 1. Lecture 2, as I said, is lost to the world forever (well, at least the audio), since audio recording did not work that day. '''Hannah 3Nov09 10:06pm''' | To Björn: You can assume you have gaps of arbitrary size. '''Marjan 30Nov 14:43''' |
Line 25: | Line 26: |
Dear Marius + all: Yes, the lectures are recorded, except for Lecture 2, where there were technical problems (no signal from the microphone). I always copy the Lecturnity files to my machine after the lecture, but don't know yet how how to publish them on the web so that they are easily viewable by others. I will meet with our group's technician tomorrow, and ask him about this. Stay tuned! '''Hannah 5Nov09 8:36pm''' | To Claudius: The whole collection with all words. '''Marjan 30Nov 14:43''' |
Line 27: | Line 28: |
Hi, I noticed that you record your lectures. Is it somehow possible to download these recordings or will they be released later? '''Marius Nov5th, 4:54 p.m.''' | Is there a limit on how large gaps may be in exercise 3? I'm not sure for which case the two entropies actually fulfill the equation. Gaps that "make sense" (ther sum is not larger than n-1), gaps that are at most n, or arbitrary gaps? '''Björn 30Nov09 14:31''' |
Line 29: | Line 30: |
Hi Waleed, when you create a conflict, it's your responsibility to remove it and not leave a mess behind. If the instructions given when the conflict occurs do not suffice, try to find more information on the Wiki help pages. '''Hannah 3Nov09 9:00pm''' | In Exercise 2, you ask for the costs of scanning the inv. lists of all words in the "collection". Do you mean the collection of words, matching the prefix or the the whole collection with all words in the inv. index? '''Claudius 30Nov09 2:16pm''' |
Line 31: | Line 32: |
I uploaded my Files and put a new row on table in the excercies sheet 2 page but when i pressed save button it shows me conflict. my version and other version of list. how can i remove conflict? does my assignment is submitted properly or not? '''Waleed''' 3Nov09 | Hi Dragos, just three-letter prefixes are fine. I have no plans yet for future exercises with a "*" in the middle. '''Hannah 29Nov09 11:10pm''' For exercise 1, should we allow the "*" to be in any place ? Or just three letter prefix is sufficient ? I am asking because it would be good to know if we might need on later Exercise Sheets searches that allow multiple "*" in different positions, so that we do it now. '''Dragos 29 Nov 22:55''' Hi Björn, by ratio I simply mean the quotient, that is, how much bigger the one is then the other. For example, if, for a particular prefix, the total size from (1) is one million, and the size from (2) is ten thousand, then, for that prefix, the ratio between the two is one hundred. '''Hannah 29Nov09 7:48pm''' Hello, I wonder what's meant with the ratio demanded in exercise 1. If i have n lists with a maximum length of "a" and a total length of "b". Isn't the ratio something like "a:b"? At least that is what I thought. But adding a colon does not seem to be sufficient for a part of the exercise. Sorry for the meaningless question but I don't want to miss points because I'm not sure how to understand the word ratio. '''Björn 11-29 19:39''' To all: about the selection of the ten prefixes. The idea was that you pick a meaningful variety by hand, that is, such prefixes which one could imagine that one would really type them. The exact selection doesn't really matter, but do avoid extreme cases like a prefix ''yzq'' with one completion and an inverted list of three doc ids. '''Hannah 29Nov09 6:28pm''' To Marius + all: yes, I am sorry, "cost" was very imprecise here, I actually simply meant the time your code takes. '''Hannah 29Nov09 6:19pm''' So you say that you mean by "costs" the running time? Or do you understand something else when you say we have to calculate the costs? '''Marius 11/29/09 4:58pm''' Notice about Problem 2: You should use precise timers when measuring the running times. If your collection is very small and you round up your times, it's easy to get 0 ms when merging or scanning the inverted lists. I recommend using microsecond scale. '''Marjan 29Nov09 16:47''' To Florian: Yes, you can do anything you want to find those words (as long as you produce the required outputs). '''Marjan 29Nov09 16:44''' For exercise 1, should we use one of the methods presented in the lecture to find all words in the collection with the prefixes or can we do just anything to get them (though it might not be as efficient)? '''Florian 29Nov09 03:51pm''' When you scan, please make sure that you do something very simple with the elements, like summing up all doc ids, and then outputting that sum. Otherwise a clever compiler might figure out that it can remove the whole loop, because it is not producing a result that is used anywhere. '''Hannah 28Nov09 11:48pm''' To Mirko: Yes, scanning means one pass over the elements. '''Marjan 28Nov09 19:19''' Hi, about exercise2: is by "scanning" meant that one looks at every element exactly once? (=> costs of scanning a list are just the size of the list) '''Mirko 28Nov, 19:12''' |
Welcome to the Wiki page of the course Search Engines, WS 2009 / 2010. Lecturer: Hannah Bast. Tutorials: Marjan Celikik. Course web page: click here.
Here are PDFs of the slides of the lectures so far: Lecture 1, Lecture 2, Lecture 3, Lecture 4, Lecture 5, Lecture 6.
Here are .lpd files of the recordings of the lectures so far (except Lecture 2, where we had problems with the microphone): Recording Lecture 1, Recording Lecture 3, Recording Lecture 4, Recording Lecture 5 (no audio), Recording Lecture 6 (with audio for a change).
Here are PDFs of the exercise sheets so far: Exercise Sheet 1, Exercise Sheet 2, Exercise Sheet 3, Exercise Sheet 4, Exercise Sheet 5, Exercise Sheet 6.
Here are your solutions and comments on the previous exercise sheets: Solutions and Comments 1, Solutions and Comments 2, Solutions and Comments 3, Solutions and Comments 4, Solutions and Comments 5.
Exercise Sheet 6
The recordings of all lectures are now available, see above. Lecture 2 is missing because we had technical problems there. To play the recordings (it's .lpd files) you need the Lecturnity Player. You can download the player for free here.
Here are the rules for the exercises as explained in Lecture 2.
Here you can upload your solutions for Exercise Sheet 6.
Questions or comments below this line, most recent on top please
Hi Björn + all, it doesn't really matter, but I (and probably most humans) find ratios > 1 more intuitive. Just compare 8 and 0.125, which one is more catchy. Hannah 30Nov09 11:59pm
Does it matter which way round we express the ratios? Depending on how we build the quotient, we get different values (all smaller or all greater 1). Or is that up to us? Should be possible to compare our results anyway, I assume. Björn 30Nov 23:36
To Björn: You can assume you have gaps of arbitrary size. Marjan 30Nov 14:43
To Claudius: The whole collection with all words. Marjan 30Nov 14:43
Is there a limit on how large gaps may be in exercise 3? I'm not sure for which case the two entropies actually fulfill the equation. Gaps that "make sense" (ther sum is not larger than n-1), gaps that are at most n, or arbitrary gaps? Björn 30Nov09 14:31
In Exercise 2, you ask for the costs of scanning the inv. lists of all words in the "collection". Do you mean the collection of words, matching the prefix or the the whole collection with all words in the inv. index? Claudius 30Nov09 2:16pm
Hi Dragos, just three-letter prefixes are fine. I have no plans yet for future exercises with a "*" in the middle. Hannah 29Nov09 11:10pm
For exercise 1, should we allow the "*" to be in any place ? Or just three letter prefix is sufficient ? I am asking because it would be good to know if we might need on later Exercise Sheets searches that allow multiple "*" in different positions, so that we do it now. Dragos 29 Nov 22:55
Hi Björn, by ratio I simply mean the quotient, that is, how much bigger the one is then the other. For example, if, for a particular prefix, the total size from (1) is one million, and the size from (2) is ten thousand, then, for that prefix, the ratio between the two is one hundred. Hannah 29Nov09 7:48pm
Hello, I wonder what's meant with the ratio demanded in exercise 1. If i have n lists with a maximum length of "a" and a total length of "b". Isn't the ratio something like "a:b"? At least that is what I thought. But adding a colon does not seem to be sufficient for a part of the exercise. Sorry for the meaningless question but I don't want to miss points because I'm not sure how to understand the word ratio. Björn 11-29 19:39
To all: about the selection of the ten prefixes. The idea was that you pick a meaningful variety by hand, that is, such prefixes which one could imagine that one would really type them. The exact selection doesn't really matter, but do avoid extreme cases like a prefix yzq with one completion and an inverted list of three doc ids. Hannah 29Nov09 6:28pm
To Marius + all: yes, I am sorry, "cost" was very imprecise here, I actually simply meant the time your code takes. Hannah 29Nov09 6:19pm
So you say that you mean by "costs" the running time? Or do you understand something else when you say we have to calculate the costs? Marius 11/29/09 4:58pm
Notice about Problem 2: You should use precise timers when measuring the running times. If your collection is very small and you round up your times, it's easy to get 0 ms when merging or scanning the inverted lists. I recommend using microsecond scale. Marjan 29Nov09 16:47
To Florian: Yes, you can do anything you want to find those words (as long as you produce the required outputs). Marjan 29Nov09 16:44
For exercise 1, should we use one of the methods presented in the lecture to find all words in the collection with the prefixes or can we do just anything to get them (though it might not be as efficient)? Florian 29Nov09 03:51pm
When you scan, please make sure that you do something very simple with the elements, like summing up all doc ids, and then outputting that sum. Otherwise a clever compiler might figure out that it can remove the whole loop, because it is not producing a result that is used anywhere. Hannah 28Nov09 11:48pm
To Mirko: Yes, scanning means one pass over the elements. Marjan 28Nov09 19:19
Hi, about exercise2: is by "scanning" meant that one looks at every element exactly once? (=> costs of scanning a list are just the size of the list) Mirko 28Nov, 19:12