833
Comment:
|
8574
|
Deletions are marked like this. | Additions are marked like this. |
Line 3: | Line 3: |
Here are PDFs of the slides of the lectures so far: [[attachment:SearchEnginesWS0910/lecture-1.pdf|Lecture 1]], [[attachment:SearchEnginesWS0910/lecture-2.pdf|Lecture 2]], [[attachment:SearchEnginesWS0910/lecture-3.pdf|Lecture 3]]. | |
Line 4: | Line 5: |
= Exercise Sheet 1 = | Here are the recordings of some of the lectures so far (Lecture 1 still missing, in Lecture 2 the microphone signal did not come through): [[http://vulcano.informatik.uni-freiburg.de/lecturnity/lecture1/Search_Engines,_Lecture_3,_5Nov09_1_05_11_2009_16_16_20.html|Lecture 3]] |
Line 6: | Line 7: |
[[SearchEnginesWS0910/StudentIntros|Introduce yourself on this page please (Exercise 1)]] | Here are PDFs of the exercise sheets so far: [[attachment:SearchEnginesWS0910/exercise-1.pdf|Exercise Sheet 1]], [[attachment:SearchEnginesWS0910/exercise-2.pdf|Exercise Sheet 2]], [[attachment:SearchEnginesWS0910/exercise-3.pdf|Exercise Sheet 3]]. |
Line 8: | Line 9: |
[[SearchEnginesWS0910/ExerciseSheet1|Upload the results to Exercise Sheet 1 + write a summary of your results on this page please]] | Here are your solutions and comments on the previous exercise sheets: [[SearchEnginesWS0910/ExerciseSheet1|Exercise Sheet 1]], [[SearchEnginesWS0910/ExerciseSheet2|Exercise Sheet 2]]. [[SearchEnginesWS0910/Rules|Here are the rules for the exercises as explained in Lecture 2]]. = Exercise Sheet 3 = Above, you find a link to a published recording of Lecture 3. Please try if that works for you. [[SearchEnginesWS0910/ExerciseSheet3|Here you can upload your solutions for Exercise Sheet 3]]. |
Line 12: | Line 20: |
When you add a question or comment here, please end it with your name and the date and time, just like I did now. '''Hannah 22Oct09 01:59''' | Hi, If i use and-ish retrieval how is the relevance of hits is measured? The hits which have one of the keywords are relevant or only those who have all the keywords are relevant. I am confused!! '''Paresh 10 Nov 1:31AM ''' Hi Dragos + all, pick three documents which got a score of zero. For any reasonable query and a not too small collection there should be such documents. '''Hannah 10Nov09 0:24am''' Hello :). For exercise 3, should we find three-relevant documents that were not returned at all, or that were low ranked ? I guess that the goal is to discuss why the low-rank, right ? :) '''Dragos 9Nov 11.56 PM''' Hi Dragos, if you want you can use your implementation from Exercise 1 (extended to deal with scores, of course). If your implementation could only deal with two-word queries, you have to find a suitable two-word query (see what I wrote below), but that should be possible. I don't understand what this has to do with the vector-space model though? The vector space model is just a way to get a formula for ranking documents. Processing the queries is still done via inverted lists. Understand that inverted lists are just an efficient way to store the non-zero entries of a sparse matrix. You would never store the term-document matrix as a two-dimensional array, that would be huge ... and a big waste, because most entries are actually zero (that's what "sparse" means). '''Hannah 9Nov09 2:31am''' For exercises 2 and 3, should the query be multi-word, or the exercises refer to the two-word query implemented in Ex Sheet 1 ? I am asking because it's clearly more simple to use the "easy" method(presented in the lecture) for two-word queries and the "vector space model" for a multi-word query. Thank you in advance ! '''Dragos 9Nov 00.27 ''' To Björn + all: I said non-trivial so that you don't take a super-specific query, which has only one or two matching documents, which you can easily retrieve with 100% precision and 100% recall with the right query. An example would be an article with a very specific title like "On the influence of Blancmanges from Skyron on Scottish tennis playing skills". Then, if your collection is not super large, the query "blancmanges skyron scottish" will be perfect. Don't pick a query like that. Also do not just formulate a query, but also write down the search request that you had in mind, so that you have a yardstick to determine what is relevant for that query and what is not. [[attachment:trec_queries.txt|Here are some example of queries from TREC, the big IR benchmark conference]]. Each query there has a so-called "title" (what you would typically enter as query words), a "description" (a short description of what the query is about), and a "narrative" (a long description of what the query is about). '''Hannah 08Nov09 3:28pm''' Concerning the exercises: What is a non-trivial query? A query that does contain multiple documents or does it have to consist of multiple words, too? Anything else? '''Björn 8Nov09 14:26''' The recording works for mac os with flip4mac: http://www.microsoft.com/windows/windowsmedia/player/wmcomponents.mspx '''Jonas 8Nov09 14:06''' Maybe you could consider putting it on electures - it seems to be the standard place for putting recordings and slides online as far as I know and there are already some solutions for putting lecturnity files online in a platform-independent manner. I don't know if it's practical but it's also nice having all lectures in one place i'd say... http://electures.informatik.uni-freiburg.de/ '''Alex 7Nov09 18:05''' In Linux I see no suitable plugin. I would like to download the .lpd file too. We can test it with our old Lecturnity versions (i have 2.0) and if it doesn't work we can download 4.x at http://www.lecturnity.de/de/download/lecturnity-player/ '''Waldemar 7Nov09 12:49''' Thanks, Paresh, yes I can do that. So do all students have access to the latest version (should be at least 4.x otherwise it will not work I think) of Lecturnity? '''Hannah 6Nov09 11:35pm''' Hi, yes the recording is working properly after downloading the plug-in. kindly upload the rest files. Also it will be helpful if you could give links to .lpd files since it is easier to download and play them in lecturnity player than browser and one can play them at any time. '''Paresh 6 Nov09 11:25pm''' To Mirko + all: whenever we write "prove", we mean a proof in the mathematical sense. For the exercises, the challenge is often two-fold. You first have to turn the statement of the exercise into a formal statement. Then you have to prove that statement. For Exercise 4 you will first have to specify the order in which the inverted lists should be sorted. Then you have to prove that the document with the i-th largest score (formed by max aggregation), where i <= k, is indeed among one of the k first entries wrt to the specified order, in at least one of the inverted lists. '''Hannah 3Nov09 10:29pm''' About Exercise4: I actually dont know how to to write down (but i think i know how/why it works) the prove of top-k retrieval with the maximum-score. Is it okay to describe it in words or do we have to formalize it in a certain way? '''Mirko 5Nov09 22:21pm''' Ok, I have played around a bit with lecturnity myself, and published Lecture 3, see the link above. For Marjan it worked, he only needed to install some Windows Media plugin for his Firefox. Please also try, and tell me if there are problems. Also tell me if everything goes fine. (It's enough if one or two people tell me.) If it does I will also publish Lecture 1. Lecture 2, as I said, is lost to the world forever (well, at least the audio), since audio recording did not work that day. '''Hannah 3Nov09 10:06pm''' Dear Marius + all: Yes, the lectures are recorded, except for Lecture 2, where there were technical problems (no signal from the microphone). I always copy the Lecturnity files to my machine after the lecture, but don't know yet how how to publish them on the web so that they are easily viewable by others. I will meet with our group's technician tomorrow, and ask him about this. Stay tuned! '''Hannah 5Nov09 8:36pm''' Hi, I noticed that you record your lectures. Is it somehow possible to download these recordings or will they be released later? '''Marius Nov5th, 4:54 p.m.''' Hi Waleed, when you create a conflict, it's your responsibility to remove it and not leave a mess behind. If the instructions given when the conflict occurs do not suffice, try to find more information on the Wiki help pages. '''Hannah 3Nov09 9:00pm''' I uploaded my Files and put a new row on table in the excercies sheet 2 page but when i pressed save button it shows me conflict. my version and other version of list. how can i remove conflict? does my assignment is submitted properly or not? '''Waleed''' 3Nov09 |
Welcome to the Wiki page of the course Search Engines, WS 2009 / 2010. Lecturer: Hannah Bast. Tutorials: Marjan Celikik. Course web page: click here.
Here are PDFs of the slides of the lectures so far: Lecture 1, Lecture 2, Lecture 3.
Here are the recordings of some of the lectures so far (Lecture 1 still missing, in Lecture 2 the microphone signal did not come through): Lecture 3
Here are PDFs of the exercise sheets so far: Exercise Sheet 1, Exercise Sheet 2, Exercise Sheet 3.
Here are your solutions and comments on the previous exercise sheets: Exercise Sheet 1, Exercise Sheet 2.
Here are the rules for the exercises as explained in Lecture 2.
Exercise Sheet 3
Above, you find a link to a published recording of Lecture 3. Please try if that works for you.
Here you can upload your solutions for Exercise Sheet 3.
Questions or comments below this line, most recent on top please
Hi, If i use and-ish retrieval how is the relevance of hits is measured? The hits which have one of the keywords are relevant or only those who have all the keywords are relevant. I am confused!! Paresh 10 Nov 1:31AM
Hi Dragos + all, pick three documents which got a score of zero. For any reasonable query and a not too small collection there should be such documents. Hannah 10Nov09 0:24am
Hello :). For exercise 3, should we find three-relevant documents that were not returned at all, or that were low ranked ? I guess that the goal is to discuss why the low-rank, right ? Dragos 9Nov 11.56 PM
Hi Dragos, if you want you can use your implementation from Exercise 1 (extended to deal with scores, of course). If your implementation could only deal with two-word queries, you have to find a suitable two-word query (see what I wrote below), but that should be possible. I don't understand what this has to do with the vector-space model though? The vector space model is just a way to get a formula for ranking documents. Processing the queries is still done via inverted lists. Understand that inverted lists are just an efficient way to store the non-zero entries of a sparse matrix. You would never store the term-document matrix as a two-dimensional array, that would be huge ... and a big waste, because most entries are actually zero (that's what "sparse" means). Hannah 9Nov09 2:31am
For exercises 2 and 3, should the query be multi-word, or the exercises refer to the two-word query implemented in Ex Sheet 1 ? I am asking because it's clearly more simple to use the "easy" method(presented in the lecture) for two-word queries and the "vector space model" for a multi-word query. Thank you in advance ! Dragos 9Nov 00.27
To Björn + all: I said non-trivial so that you don't take a super-specific query, which has only one or two matching documents, which you can easily retrieve with 100% precision and 100% recall with the right query. An example would be an article with a very specific title like "On the influence of Blancmanges from Skyron on Scottish tennis playing skills". Then, if your collection is not super large, the query "blancmanges skyron scottish" will be perfect. Don't pick a query like that. Also do not just formulate a query, but also write down the search request that you had in mind, so that you have a yardstick to determine what is relevant for that query and what is not. Here are some example of queries from TREC, the big IR benchmark conference. Each query there has a so-called "title" (what you would typically enter as query words), a "description" (a short description of what the query is about), and a "narrative" (a long description of what the query is about). Hannah 08Nov09 3:28pm
Concerning the exercises: What is a non-trivial query? A query that does contain multiple documents or does it have to consist of multiple words, too? Anything else? Björn 8Nov09 14:26
The recording works for mac os with flip4mac: http://www.microsoft.com/windows/windowsmedia/player/wmcomponents.mspx Jonas 8Nov09 14:06
Maybe you could consider putting it on electures - it seems to be the standard place for putting recordings and slides online as far as I know and there are already some solutions for putting lecturnity files online in a platform-independent manner. I don't know if it's practical but it's also nice having all lectures in one place i'd say... http://electures.informatik.uni-freiburg.de/ Alex 7Nov09 18:05
In Linux I see no suitable plugin. I would like to download the .lpd file too. We can test it with our old Lecturnity versions (i have 2.0) and if it doesn't work we can download 4.x at http://www.lecturnity.de/de/download/lecturnity-player/ Waldemar 7Nov09 12:49
Thanks, Paresh, yes I can do that. So do all students have access to the latest version (should be at least 4.x otherwise it will not work I think) of Lecturnity? Hannah 6Nov09 11:35pm
Hi, yes the recording is working properly after downloading the plug-in. kindly upload the rest files. Also it will be helpful if you could give links to .lpd files since it is easier to download and play them in lecturnity player than browser and one can play them at any time. Paresh 6 Nov09 11:25pm
To Mirko + all: whenever we write "prove", we mean a proof in the mathematical sense. For the exercises, the challenge is often two-fold. You first have to turn the statement of the exercise into a formal statement. Then you have to prove that statement. For Exercise 4 you will first have to specify the order in which the inverted lists should be sorted. Then you have to prove that the document with the i-th largest score (formed by max aggregation), where i <= k, is indeed among one of the k first entries wrt to the specified order, in at least one of the inverted lists. Hannah 3Nov09 10:29pm
About Exercise4: I actually dont know how to to write down (but i think i know how/why it works) the prove of top-k retrieval with the maximum-score. Is it okay to describe it in words or do we have to formalize it in a certain way? Mirko 5Nov09 22:21pm
Ok, I have played around a bit with lecturnity myself, and published Lecture 3, see the link above. For Marjan it worked, he only needed to install some Windows Media plugin for his Firefox. Please also try, and tell me if there are problems. Also tell me if everything goes fine. (It's enough if one or two people tell me.) If it does I will also publish Lecture 1. Lecture 2, as I said, is lost to the world forever (well, at least the audio), since audio recording did not work that day. Hannah 3Nov09 10:06pm
Dear Marius + all: Yes, the lectures are recorded, except for Lecture 2, where there were technical problems (no signal from the microphone). I always copy the Lecturnity files to my machine after the lecture, but don't know yet how how to publish them on the web so that they are easily viewable by others. I will meet with our group's technician tomorrow, and ask him about this. Stay tuned! Hannah 5Nov09 8:36pm
Hi, I noticed that you record your lectures. Is it somehow possible to download these recordings or will they be released later? Marius Nov5th, 4:54 p.m.
Hi Waleed, when you create a conflict, it's your responsibility to remove it and not leave a mess behind. If the instructions given when the conflict occurs do not suffice, try to find more information on the Wiki help pages. Hannah 3Nov09 9:00pm
I uploaded my Files and put a new row on table in the excercies sheet 2 page but when i pressed save button it shows me conflict. my version and other version of list. how can i remove conflict? does my assignment is submitted properly or not? Waleed 3Nov09