4212
Comment:
|
10505
|
Deletions are marked like this. | Additions are marked like this. |
Line 3: | Line 3: |
Here are PDFs of the slides of the lectures so far: [[attachment:SearchEnginesWS0910/lecture-1.pdf|Lecture 1]], [[attachment:SearchEnginesWS0910/lecture-2.pdf|Lecture 2]], [[attachment:SearchEnginesWS0910/lecture-3.pdf|Lecture 3]]. | Here are PDFs of the slides of the lectures so far: [[attachment:SearchEnginesWS0910/lecture-1.pdf|Lecture 1]], [[attachment:SearchEnginesWS0910/lecture-2.pdf|Lecture 2]], [[attachment:SearchEnginesWS0910/lecture-3.pdf|Lecture 3]], [[attachment:SearchEnginesWS0910/lecture-4.pdf|Lecture 4]]. |
Line 7: | Line 7: |
Here are PDFs of the exercise sheets so far: [[attachment:SearchEnginesWS0910/exercise-1.pdf|Exercise Sheet 1]], [[attachment:SearchEnginesWS0910/exercise-2.pdf|Exercise Sheet 2]], [[attachment:SearchEnginesWS0910/exercise-3.pdf|Exercise Sheet 3]]. | Here are PDFs of the exercise sheets so far: [[attachment:SearchEnginesWS0910/exercise-1.pdf|Exercise Sheet 1]], [[attachment:SearchEnginesWS0910/exercise-2.pdf|Exercise Sheet 2]], [[attachment:SearchEnginesWS0910/exercise-3.pdf|Exercise Sheet 3]], [[attachment:SearchEnginesWS0910/exercise-4.pdf|Exercise Sheet 4]]. |
Line 14: | Line 14: |
Line 20: | Line 19: |
And just to clarify what a single-cycle permutation is. Here is an example for an array of size 5 with a permutation that is a single cycle: 5 4 1 3 2. Why single cycle? Well, A[1] = 5, A[5] = 2, A[2] = 4, A[4] = 3, A[3] = 1. (My indices in this example are 1,...,5 and not 0,...,4.) Here is an example of a permutation with three cycles: 2 1 4 3 5. The first cycle is A[1] = 2, A[2] =1. The second cycle is A[3] = 4, A[4] = 3. The third cycle is A[5] = 5. '''Hannah 12Nov09 8:04pm''' Hi Daniel + all, I don't quite understand your question and your example (if your array is 1 5 3 4 2, why is A[1] = 3?). In case you refer to the requirement of the exercise that the permutation consists only of a single cycle. That is because your code should go over each element exactly once (it should, of course, stop after n iterations, where n is the size of the array). If your permutation has more than one cycle, it is hard to achieve that. Also note that for both (1) and (2), the sum of the array values should be sum_i=1,...,n i = n * (n+1) / 2. '''Hannah 12Nov09 7:54pm''' Hi, I just looked at the new exercise sheet 4, in exercise 1 we should generate a permutation and sum the resulting array up, am I wrong or doesn't iterating method two iterate throw the whole array in every situation. for ex.: n= 5 permutation: 1 5 3 4 2, then A[1] = 3, A[A[1]]= A[3] = 1, A[1] = 3 ... '''Daniel 12Nov09 19:44pm''' Hi Paresh + all. If you do and-ish retrieval, all documents which contain at least one of the query words are relevant. If you do and retrieval, only the documents which contain all of the query words are relevant. Independent of which scoring scheme you use, you can do it either way. Just clearly state which way you are doing it, and things will be fine. '''Hannah 10Nov09 2:46am''' Hi, If i use and-ish retrieval how is the relevance of hits is measured? The hits which have one of the keywords are relevant or only those who have all the keywords are relevant. I am confused!! '''Paresh 10 Nov 1:31AM ''' Hi Dragos + all, pick three documents which got a score of zero. For any reasonable query and a not too small collection there should be such documents. '''Hannah 10Nov09 0:24am''' Hello :). For exercise 3, should we find three-relevant documents that were not returned at all, or that were low ranked ? I guess that the goal is to discuss why the low-rank, right ? :) '''Dragos 9Nov 11.56 PM''' Hi Dragos, if you want you can use your implementation from Exercise 1 (extended to deal with scores, of course). If your implementation could only deal with two-word queries, you have to find a suitable two-word query (see what I wrote below), but that should be possible. I don't understand what this has to do with the vector-space model though? The vector space model is just a way to get a formula for ranking documents. Processing the queries is still done via inverted lists. Understand that inverted lists are just an efficient way to store the non-zero entries of a sparse matrix. You would never store the term-document matrix as a two-dimensional array, that would be huge ... and a big waste, because most entries are actually zero (that's what "sparse" means). '''Hannah 9Nov09 2:31am''' For exercises 2 and 3, should the query be multi-word, or the exercises refer to the two-word query implemented in Ex Sheet 1 ? I am asking because it's clearly more simple to use the "easy" method(presented in the lecture) for two-word queries and the "vector space model" for a multi-word query. Thank you in advance ! '''Dragos 9Nov 00.27 ''' To Björn + all: I said non-trivial so that you don't take a super-specific query, which has only one or two matching documents, which you can easily retrieve with 100% precision and 100% recall with the right query. An example would be an article with a very specific title like "On the influence of Blancmanges from Skyron on Scottish tennis playing skills". Then, if your collection is not super large, the query "blancmanges skyron scottish" will be perfect. Don't pick a query like that. Also do not just formulate a query, but also write down the search request that you had in mind, so that you have a yardstick to determine what is relevant for that query and what is not. [[attachment:trec_queries.txt|Here are some example of queries from TREC, the big IR benchmark conference]]. Each query there has a so-called "title" (what you would typically enter as query words), a "description" (a short description of what the query is about), and a "narrative" (a long description of what the query is about). '''Hannah 08Nov09 3:28pm''' Concerning the exercises: What is a non-trivial query? A query that does contain multiple documents or does it have to consist of multiple words, too? Anything else? '''Björn 8Nov09 14:26''' The recording works for mac os with flip4mac: http://www.microsoft.com/windows/windowsmedia/player/wmcomponents.mspx '''Jonas 8Nov09 14:06''' Maybe you could consider putting it on electures - it seems to be the standard place for putting recordings and slides online as far as I know and there are already some solutions for putting lecturnity files online in a platform-independent manner. I don't know if it's practical but it's also nice having all lectures in one place i'd say... http://electures.informatik.uni-freiburg.de/ '''Alex 7Nov09 18:05''' In Linux I see no suitable plugin. I would like to download the .lpd file too. We can test it with our old Lecturnity versions (i have 2.0) and if it doesn't work we can download 4.x at http://www.lecturnity.de/de/download/lecturnity-player/ '''Waldemar 7Nov09 12:49''' Thanks, Paresh, yes I can do that. So do all students have access to the latest version (should be at least 4.x otherwise it will not work I think) of Lecturnity? '''Hannah 6Nov09 11:35pm''' Hi, yes the recording is working properly after downloading the plug-in. kindly upload the rest files. Also it will be helpful if you could give links to .lpd files since it is easier to download and play them in lecturnity player than browser and one can play them at any time. '''Paresh 6 Nov09 11:25pm''' |
Welcome to the Wiki page of the course Search Engines, WS 2009 / 2010. Lecturer: Hannah Bast. Tutorials: Marjan Celikik. Course web page: click here.
Here are PDFs of the slides of the lectures so far: Lecture 1, Lecture 2, Lecture 3, Lecture 4.
Here are the recordings of some of the lectures so far (Lecture 1 still missing, in Lecture 2 the microphone signal did not come through): Lecture 3
Here are PDFs of the exercise sheets so far: Exercise Sheet 1, Exercise Sheet 2, Exercise Sheet 3, Exercise Sheet 4.
Here are your solutions and comments on the previous exercise sheets: Exercise Sheet 1, Exercise Sheet 2.
Here are the rules for the exercises as explained in Lecture 2.
Exercise Sheet 3
Above, you find a link to a published recording of Lecture 3. Please try if that works for you.
Here you can upload your solutions for Exercise Sheet 3.
Questions or comments below this line, most recent on top please
And just to clarify what a single-cycle permutation is. Here is an example for an array of size 5 with a permutation that is a single cycle: 5 4 1 3 2. Why single cycle? Well, A[1] = 5, A[5] = 2, A[2] = 4, A[4] = 3, A[3] = 1. (My indices in this example are 1,...,5 and not 0,...,4.) Here is an example of a permutation with three cycles: 2 1 4 3 5. The first cycle is A[1] = 2, A[2] =1. The second cycle is A[3] = 4, A[4] = 3. The third cycle is A[5] = 5. Hannah 12Nov09 8:04pm
Hi Daniel + all, I don't quite understand your question and your example (if your array is 1 5 3 4 2, why is A[1] = 3?). In case you refer to the requirement of the exercise that the permutation consists only of a single cycle. That is because your code should go over each element exactly once (it should, of course, stop after n iterations, where n is the size of the array). If your permutation has more than one cycle, it is hard to achieve that. Also note that for both (1) and (2), the sum of the array values should be sum_i=1,...,n i = n * (n+1) / 2. Hannah 12Nov09 7:54pm
Hi, I just looked at the new exercise sheet 4, in exercise 1 we should generate a permutation and sum the resulting array up, am I wrong or doesn't iterating method two iterate throw the whole array in every situation. for ex.: n= 5 permutation: 1 5 3 4 2, then A[1] = 3, A[A[1]]= A[3] = 1, A[1] = 3 ... Daniel 12Nov09 19:44pm
Hi Paresh + all. If you do and-ish retrieval, all documents which contain at least one of the query words are relevant. If you do and retrieval, only the documents which contain all of the query words are relevant. Independent of which scoring scheme you use, you can do it either way. Just clearly state which way you are doing it, and things will be fine. Hannah 10Nov09 2:46am
Hi, If i use and-ish retrieval how is the relevance of hits is measured? The hits which have one of the keywords are relevant or only those who have all the keywords are relevant. I am confused!! Paresh 10 Nov 1:31AM
Hi Dragos + all, pick three documents which got a score of zero. For any reasonable query and a not too small collection there should be such documents. Hannah 10Nov09 0:24am
Hello :). For exercise 3, should we find three-relevant documents that were not returned at all, or that were low ranked ? I guess that the goal is to discuss why the low-rank, right ? Dragos 9Nov 11.56 PM
Hi Dragos, if you want you can use your implementation from Exercise 1 (extended to deal with scores, of course). If your implementation could only deal with two-word queries, you have to find a suitable two-word query (see what I wrote below), but that should be possible. I don't understand what this has to do with the vector-space model though? The vector space model is just a way to get a formula for ranking documents. Processing the queries is still done via inverted lists. Understand that inverted lists are just an efficient way to store the non-zero entries of a sparse matrix. You would never store the term-document matrix as a two-dimensional array, that would be huge ... and a big waste, because most entries are actually zero (that's what "sparse" means). Hannah 9Nov09 2:31am
For exercises 2 and 3, should the query be multi-word, or the exercises refer to the two-word query implemented in Ex Sheet 1 ? I am asking because it's clearly more simple to use the "easy" method(presented in the lecture) for two-word queries and the "vector space model" for a multi-word query. Thank you in advance ! Dragos 9Nov 00.27
To Björn + all: I said non-trivial so that you don't take a super-specific query, which has only one or two matching documents, which you can easily retrieve with 100% precision and 100% recall with the right query. An example would be an article with a very specific title like "On the influence of Blancmanges from Skyron on Scottish tennis playing skills". Then, if your collection is not super large, the query "blancmanges skyron scottish" will be perfect. Don't pick a query like that. Also do not just formulate a query, but also write down the search request that you had in mind, so that you have a yardstick to determine what is relevant for that query and what is not. Here are some example of queries from TREC, the big IR benchmark conference. Each query there has a so-called "title" (what you would typically enter as query words), a "description" (a short description of what the query is about), and a "narrative" (a long description of what the query is about). Hannah 08Nov09 3:28pm
Concerning the exercises: What is a non-trivial query? A query that does contain multiple documents or does it have to consist of multiple words, too? Anything else? Björn 8Nov09 14:26
The recording works for mac os with flip4mac: http://www.microsoft.com/windows/windowsmedia/player/wmcomponents.mspx Jonas 8Nov09 14:06
Maybe you could consider putting it on electures - it seems to be the standard place for putting recordings and slides online as far as I know and there are already some solutions for putting lecturnity files online in a platform-independent manner. I don't know if it's practical but it's also nice having all lectures in one place i'd say... http://electures.informatik.uni-freiburg.de/ Alex 7Nov09 18:05
In Linux I see no suitable plugin. I would like to download the .lpd file too. We can test it with our old Lecturnity versions (i have 2.0) and if it doesn't work we can download 4.x at http://www.lecturnity.de/de/download/lecturnity-player/ Waldemar 7Nov09 12:49
Thanks, Paresh, yes I can do that. So do all students have access to the latest version (should be at least 4.x otherwise it will not work I think) of Lecturnity? Hannah 6Nov09 11:35pm
Hi, yes the recording is working properly after downloading the plug-in. kindly upload the rest files. Also it will be helpful if you could give links to .lpd files since it is easier to download and play them in lecturnity player than browser and one can play them at any time. Paresh 6 Nov09 11:25pm
To Mirko + all: whenever we write "prove", we mean a proof in the mathematical sense. For the exercises, the challenge is often two-fold. You first have to turn the statement of the exercise into a formal statement. Then you have to prove that statement. For Exercise 4 you will first have to specify the order in which the inverted lists should be sorted. Then you have to prove that the document with the i-th largest score (formed by max aggregation), where i <= k, is indeed among one of the k first entries wrt to the specified order, in at least one of the inverted lists. Hannah 3Nov09 10:29pm
About Exercise4: I actually dont know how to to write down (but i think i know how/why it works) the prove of top-k retrieval with the maximum-score. Is it okay to describe it in words or do we have to formalize it in a certain way? Mirko 5Nov09 22:21pm
Ok, I have played around a bit with lecturnity myself, and published Lecture 3, see the link above. For Marjan it worked, he only needed to install some Windows Media plugin for his Firefox. Please also try, and tell me if there are problems. Also tell me if everything goes fine. (It's enough if one or two people tell me.) If it does I will also publish Lecture 1. Lecture 2, as I said, is lost to the world forever (well, at least the audio), since audio recording did not work that day. Hannah 3Nov09 10:06pm
Dear Marius + all: Yes, the lectures are recorded, except for Lecture 2, where there were technical problems (no signal from the microphone). I always copy the Lecturnity files to my machine after the lecture, but don't know yet how how to publish them on the web so that they are easily viewable by others. I will meet with our group's technician tomorrow, and ask him about this. Stay tuned! Hannah 5Nov09 8:36pm
Hi, I noticed that you record your lectures. Is it somehow possible to download these recordings or will they be released later? Marius Nov5th, 4:54 p.m.
Hi Waleed, when you create a conflict, it's your responsibility to remove it and not leave a mess behind. If the instructions given when the conflict occurs do not suffice, try to find more information on the Wiki help pages. Hannah 3Nov09 9:00pm
I uploaded my Files and put a new row on table in the excercies sheet 2 page but when i pressed save button it shows me conflict. my version and other version of list. how can i remove conflict? does my assignment is submitted properly or not? Waleed 3Nov09