10660
Comment:
|
6791
|
Deletions are marked like this. | Additions are marked like this. |
Line 3: | Line 3: |
Here are PDFs of the slides of the lectures so far: [[attachment:SearchEnginesWS0910/lecture-1.pdf|Lecture 1]], [[attachment:SearchEnginesWS0910/lecture-2.pdf|Lecture 2]]. | Here are PDFs of the slides of the lectures so far: [[attachment:SearchEnginesWS0910/lecture-1.pdf|Lecture 1]], [[attachment:SearchEnginesWS0910/lecture-2.pdf|Lecture 2]], [[attachment:SearchEnginesWS0910/lecture-3.pdf|Lecture 3]]. |
Line 5: | Line 5: |
Here are PDFs of the exercise sheets so far: [[attachment:SearchEnginesWS0910/exercise-1.pdf|Exercise Sheet 1]], [[attachment:SearchEnginesWS0910/exercise-2.pdf|Exercise Sheet 2]]. | Here are the recordings of some of the lectures so far (Lecture 1 still missing, in Lecture 2 the microphone signal did not come through): [[http://vulcano.informatik.uni-freiburg.de/lecturnity/lecture1/Search_Engines,_Lecture_3,_5Nov09_1_05_11_2009_16_16_20.html|Lecture 3]] |
Line 7: | Line 7: |
Here are your solutions and comments on the previous exercise sheets: [[SearchEnginesWS0910/ExerciseSheet1|Exercise Sheet 1]]. | Here are PDFs of the exercise sheets so far: [[attachment:SearchEnginesWS0910/exercise-1.pdf|Exercise Sheet 1]], [[attachment:SearchEnginesWS0910/exercise-2.pdf|Exercise Sheet 2]], [[attachment:SearchEnginesWS0910/exercise-3.pdf|Exercise Sheet 3]]. Here are your solutions and comments on the previous exercise sheets: [[SearchEnginesWS0910/ExerciseSheet1|Exercise Sheet 1]], [[SearchEnginesWS0910/ExerciseSheet2|Exercise Sheet 2]]. |
Line 11: | Line 13: |
= Exercise Sheet 2 = Here are the details about the three servers (UDP, TCP, HTTP) for Exercise 4: |
= Exercise Sheet 3 = Above, you find a link to a published recording of Lecture 3. Please try if that works for you. |
Line 14: | Line 16: |
All three servers are running on our machine vulcano.informatik.uni-freiburg.de (IP address is 132.230.152.135). The UDP server is running on port 8888 of that machine. You can send it a number and it will then send you back that number of bytes, in packets of 1000 bytes each. (That means you also have to read packets of 1000 bytes each.) The first ten bytes of each packet contain the packet id. That is interesting for checking which packets get lost and in which order packets arrive. The TCP server is running on port 9999 of that machine. You can send it a request of the form GET /<number of bytes> HTTP/1.1, or you can just use a downloading program like wget or curl and time it. The HTTP server is running on port 80, as web servers normally do. Just download the file http://vulcano.informatik.uni-freiburg.de/file_100M and measure the time. For measuring your transfer and error rates, as requested by the exercise, repeat your experiments several times and also at different times, and form the average of these measurements (or report several numbers if you get very different results). You should ask for large amounts of data, like 10 MB or 100 MB. [[SearchEnginesWS0910/ExerciseSheet2|Here you can upload your solutions for Exercise Sheet 2]]. |
[[SearchEnginesWS0910/ExerciseSheet3|Here you can upload your solutions for Exercise Sheet 3]]. |
Line 28: | Line 20: |
Hi Mirko + all. All I said is that you can *test* the TCP server via wget or curl. For the exercise you should implement your own client, but since that is very similar to the UDP client, that is not much additional work. For the HTTP part of the exercise, you can, if you want, indeed just use wget or curl and assume that the number of lost packets is zero. '''Hannah 1Nov09 6:35pm''' | To Björn + all: I said non-trivial so that you don't take a super-specific query, which has only one or two matching documents, which you can easily retrieve with 100% precision and 100% recall with the right query. An example would be an article with a very specific title like "On the influence of Blancmanges from Skyron on Scottish tennis playing skills". Then, if your collection is not super large, the query "blancmanges skyron scottish" will be perfect. Don't pick a query like that. Also do not just formulate a query, but also write down the search request that you had in mind, so that you have a yardstick to determine what is relevant for that query and what is not. [[attachment:trec_queries.txt|Here are some example of queries from TREC, the big IR benchmark conference]]. Each query there has a so-called "title" (what you would typically enter as query words), a "description" (a short description of what the query is about), and a "narrative" (a long description of what the query is about). '''Hannah 08Nov09 3:28pm''' |
Line 30: | Line 22: |
Hi, I am confused, the exercise-sheet says write a client which can communicate over TCP and UDP and for comparison query the HTTP server via wget/curl. Here you are saying we can download the TCP-part via wget/curl. Therefore i wanted to download the files over TCP via wget, but i couldn't find a way to measure the amount of lost packets, does anyone know? '''Mirko 1Nov09 17:23pm''' | Concerning the exercises: What is a non-trivial query? A query that does contain multiple documents or does it have to consist of multiple words, too? Anything else? '''Björn 8Nov09 14:26''' |
Line 32: | Line 24: |
To all: Now both servers should run fine. '''Marjan 1Nov09 17:03pm''' | The recording works for mac os with flip4mac: http://www.microsoft.com/windows/windowsmedia/player/wmcomponents.mspx '''Jonas 8Nov09 14:06''' |
Line 34: | Line 26: |
To Matthias: Unlike the TCP server, the UDP server is still working fine, I just checked. '''Marjan 1Nov09 16:17pm''' | Maybe you could consider putting it on electures - it seems to be the standard place for putting recordings and slides online as far as I know and there are already some solutions for putting lecturnity files online in a platform-independent manner. I don't know if it's practical but it's also nice having all lectures in one place i'd say... http://electures.informatik.uni-freiburg.de/ '''Alex 7Nov09 18:05''' |
Line 36: | Line 28: |
Hi, would you please check the UDP as well? It looks like it isn't returning any data now altough the same implementation worked 2 hours ago. ''' Matthias 1Nov09 3:58pm''' | In Linux I see no suitable plugin. I would like to download the .lpd file too. We can test it with our old Lecturnity versions (i have 2.0) and if it doesn't work we can download 4.x at http://www.lecturnity.de/de/download/lecturnity-player/ '''Waldemar 7Nov09 12:49''' |
Line 38: | Line 30: |
Preliminary fix for the TCP server problem: the TCP server is now automatically restarted as soon as it crashed. So you should be able to work with it properly now. '''Hannah 1Nov09 2:54pm''' | Thanks, Paresh, yes I can do that. So do all students have access to the latest version (should be at least 4.x otherwise it will not work I think) of Lecturnity? '''Hannah 6Nov09 11:35pm''' |
Line 40: | Line 32: |
Dear all, the TCP is currently crashing whenever the client aborts, and then it's down before we restart it. Marjan is working on solving this problem, and we will tell you as soon as it's done. The UDP server does not have this problem. '''Hannah 1Nov09 2:33pm''' | Hi, yes the recording is working properly after downloading the plug-in. kindly upload the rest files. Also it will be helpful if you could give links to .lpd files since it is easier to download and play them in lecturnity player than browser and one can play them at any time. '''Paresh 6 Nov09 11:25pm''' |
Line 42: | Line 34: |
Hi Zhongjie + all, as it says above "The first ten bytes of each packet contain the packet id ...". (But it only does that if a packet is larger than 10 bytes.) For example, if you ask for 10000 bytes, the server will send you 10 packets with 1000 bytes each, with ids from 0 to 9. This is interesting information, because your client can use it to print the packet id of each package it receives and see how many packets arrive out of order. You don't have to do this for the exercise, but it's interesting and easy to do. And as you will see then, out of order arrival indeed happens. '''Hannah 1Nov09 2:07pm''' | To Mirko + all: whenever we write "prove", we mean a proof in the mathematical sense. For the exercises, the challenge is often two-fold. You first have to turn the statement of the exercise into a formal statement. Then you have to prove that statement. For Exercise 4 you will first have to specify the order in which the inverted lists should be sorted. Then you have to prove that the document with the i-th largest score (formed by max aggregation), where i <= k, is indeed among one of the k first entries wrt to the specified order, in at least one of the inverted lists. '''Hannah 3Nov09 10:29pm''' |
Line 44: | Line 36: |
Well, problem solved... You need to send a package end with a '\0' char to server, otherwise server will not respond... But here is another problem: when I send a UDP package like "5\0" to server, I will receive reply package like "xxxxx". If I send "10\0", the reply is "xxxxxxxxxx". And if it is "20\0" I send, it is "0000000000\0xxxxxxxxx" I receive. Confused... '''Zhongjie 1Nov09 11:20am''' | About Exercise4: I actually dont know how to to write down (but i think i know how/why it works) the prove of top-k retrieval with the maximum-score. Is it okay to describe it in words or do we have to formalize it in a certain way? '''Mirko 5Nov09 22:21pm''' |
Line 46: | Line 38: |
I had the same problem. But try to send your query with two linefeeds at the end, like this: send_data = '50\n\n' This makes the UDP server a lot more responsive... '''Christian 1Nov09 10:26am''' | Ok, I have played around a bit with lecturnity myself, and published Lecture 3, see the link above. For Marjan it worked, he only needed to install some Windows Media plugin for his Firefox. Please also try, and tell me if there are problems. Also tell me if everything goes fine. (It's enough if one or two people tell me.) If it does I will also publish Lecture 1. Lecture 2, as I said, is lost to the world forever (well, at least the audio), since audio recording did not work that day. '''Hannah 3Nov09 10:06pm''' |
Line 48: | Line 40: |
hello Marjan, Now its work, thanks. what do you mean about skipping one ex.sheet without loosing any points? I wonder whether my exercise uploaded on 26 of Oct is still counted? '''Triatmoko 1Nov09 10:19''' | Dear Marius + all: Yes, the lectures are recorded, except for Lecture 2, where there were technical problems (no signal from the microphone). I always copy the Lecturnity files to my machine after the lecture, but don't know yet how how to publish them on the web so that they are easily viewable by others. I will meet with our group's technician tomorrow, and ask him about this. Stay tuned! '''Hannah 5Nov09 8:36pm''' |
Line 50: | Line 42: |
Hello! I still could not get any response from both the UDP and the TCP server port by now, but HTTP server works fine. If anyone could get some result, please tell me that you can get response from servers, so that I will know it's my own problem... Thank you! '''Zhongjie 1Nov09 08:43''' | Hi, I noticed that you record your lectures. Is it somehow possible to download these recordings or will they be released later? '''Marius Nov5th, 4:54 p.m.''' |
Line 52: | Line 44: |
Hi, I need some clarification on this term "HTTP result header" in Excercise 2 in sheet 2. Will HTTP header contains generic http information or something related to Results? Offcourse our Result will in HTML form. '''Waleed 1Nov09 5:52AM''' | Hi Waleed, when you create a conflict, it's your responsibility to remove it and not leave a mess behind. If the instructions given when the conflict occurs do not suffice, try to find more information on the Wiki help pages. '''Hannah 3Nov09 9:00pm''' |
Line 54: | Line 46: |
I am sorry for the downtime, these maintenance works were announced already several weeks ago, but then I forgot about them because they were scheduled on a Saturday which I thought would not affect me. The downtime also killed our servers, but now they are running again. About the corrections: of course you should get comments on what you did wrong and why you got less points for what. Sorry, if that didn't happen for the first exercise sheet. I will talk with Marjan. '''Hannah 1Nov09 00:38am''' Yes, Eric. Today afternoon and evening was nearly the whole Uni-Net Offline because of intended maintenance. I have a question: Will there be any correction or comments or a sample solution or something like that for every exercise because I now only know how many points I have for every Exercise in Exercise Sheet 1 but I don't know why I have a lack of 1/2 point in one exercise and in another. I think it would be good if anybody will know what he has done wrong or/and what he could have done better. '''Waldemar 31Oct09 21:08''' I cannot connect to vulcano.informatik.uni-freiburg.de:9999 and :8888, neither from outside, nor from inside (logged into pool account). Also the whole informatik.uni... was down a few minutes ago. It's hard to solve exercise 4 without the servers running. '''Eric 31Oct09 19:00''' Hi, I wonder whether those three servers for exercise 4 of sheet 2 online or not? '''Zhongjie 31Oct09 11:49''' To all: Please note that (almost) everybody will get +2 points for Exercise Sheet 1 (previously I did not assign any points to the first and the last problem). '''Marjan 31Oct09 10:03''' To Triatmoko and Ahmed + all: You obviously haven't created your wiki page. Please go to your link and click "Create a new page" and then "Save changes". You should then upload your solutions there and put the link on the wiki like everybody else did. If it is still not clear please ask some of your fellow students. I don't remember if it is mentioned, but you can skip one Exercise Sheet without losing any points. '''Marjan 31Oct09 09:38''' Hi, I have some question about my exercise Sheet 1, I saw in my exercise page that my name, my upload solution and code in gray color and other persons in blue color. I try to click my attachment file in my exercise page, and I have some message that there are no attachment. any body know about this issue?because my exercise uploaded on Monday 26oct '''Triatmoko 30Oct09 22:18''' Hi Björn + all: For Exercise 4 from Exercise Sheet 1 you had to write code that is at least able to process 2-word queries. If your code can indeed only handle 2-word queries and not an arbitrary number of query words, that is also fine for this exercise, you won't get less points because of that. Your second question is also very valid. You should put the various functionalities into modules / classes of their own, so that you can easily combine them for the three different binaries required for Exercises 1 - 3. Each of your three programs will then be quite short, just putting together the right things. I should have added it to my list of evil coding NoNos: never ever duplicate code, but instead put it in a class / module of its own. I hope this answers your questions, if not please ask again. Sorry for the late answer, but I was super busy until now, hardly had time to breathe. '''Hannah 30Oct09 19:08''' I have a question concerning exercise 2. There was no concrete task to produce "query processing code" on ex sheet 1. Are there any requirements that have to be fulfilled? Should it be able to handle two word queries? n-word queries? Additionally there is something else I want to ask: I think it surely isn't bad practice to write a more generic webserver and use it for exercises 1-3. Apart from that it says "change your code" some times in the exercises. How should your submission behave w.r.t. the exercises? Different src files / executables for each exercise? One program that solve each exercises depending on startup parameters? Anything else? '''Björn 30ct09 2:25pm''' I now reorganized the page. Old stuff went to separate pages (links above). The idea is that the front page is always for the current lecture / exercises. The problem with your exercise page should be solved now, Ivo. '''Hannah 30ct09 00:05am''' Having problems to access my exercise page after loging in - IvoChichkovExercises, '''Ivo 29Oct 22:56pm''' Sorry to bother you. I added the Link to exercise sheet 2 with the linked pdf. Needed this to find the sheet as fast as possible. '''Marius 29Oct 10:04 p.m.''' Is there a webpage for exercise sheet 2 somewhere? '''Johannes 29Oct 07:45 pm''' |
I uploaded my Files and put a new row on table in the excercies sheet 2 page but when i pressed save button it shows me conflict. my version and other version of list. how can i remove conflict? does my assignment is submitted properly or not? '''Waleed''' 3Nov09 |
Welcome to the Wiki page of the course Search Engines, WS 2009 / 2010. Lecturer: Hannah Bast. Tutorials: Marjan Celikik. Course web page: click here.
Here are PDFs of the slides of the lectures so far: Lecture 1, Lecture 2, Lecture 3.
Here are the recordings of some of the lectures so far (Lecture 1 still missing, in Lecture 2 the microphone signal did not come through): Lecture 3
Here are PDFs of the exercise sheets so far: Exercise Sheet 1, Exercise Sheet 2, Exercise Sheet 3.
Here are your solutions and comments on the previous exercise sheets: Exercise Sheet 1, Exercise Sheet 2.
Here are the rules for the exercises as explained in Lecture 2.
Exercise Sheet 3
Above, you find a link to a published recording of Lecture 3. Please try if that works for you.
Here you can upload your solutions for Exercise Sheet 3.
Questions or comments below this line, most recent on top please
To Björn + all: I said non-trivial so that you don't take a super-specific query, which has only one or two matching documents, which you can easily retrieve with 100% precision and 100% recall with the right query. An example would be an article with a very specific title like "On the influence of Blancmanges from Skyron on Scottish tennis playing skills". Then, if your collection is not super large, the query "blancmanges skyron scottish" will be perfect. Don't pick a query like that. Also do not just formulate a query, but also write down the search request that you had in mind, so that you have a yardstick to determine what is relevant for that query and what is not. Here are some example of queries from TREC, the big IR benchmark conference. Each query there has a so-called "title" (what you would typically enter as query words), a "description" (a short description of what the query is about), and a "narrative" (a long description of what the query is about). Hannah 08Nov09 3:28pm
Concerning the exercises: What is a non-trivial query? A query that does contain multiple documents or does it have to consist of multiple words, too? Anything else? Björn 8Nov09 14:26
The recording works for mac os with flip4mac: http://www.microsoft.com/windows/windowsmedia/player/wmcomponents.mspx Jonas 8Nov09 14:06
Maybe you could consider putting it on electures - it seems to be the standard place for putting recordings and slides online as far as I know and there are already some solutions for putting lecturnity files online in a platform-independent manner. I don't know if it's practical but it's also nice having all lectures in one place i'd say... http://electures.informatik.uni-freiburg.de/ Alex 7Nov09 18:05
In Linux I see no suitable plugin. I would like to download the .lpd file too. We can test it with our old Lecturnity versions (i have 2.0) and if it doesn't work we can download 4.x at http://www.lecturnity.de/de/download/lecturnity-player/ Waldemar 7Nov09 12:49
Thanks, Paresh, yes I can do that. So do all students have access to the latest version (should be at least 4.x otherwise it will not work I think) of Lecturnity? Hannah 6Nov09 11:35pm
Hi, yes the recording is working properly after downloading the plug-in. kindly upload the rest files. Also it will be helpful if you could give links to .lpd files since it is easier to download and play them in lecturnity player than browser and one can play them at any time. Paresh 6 Nov09 11:25pm
To Mirko + all: whenever we write "prove", we mean a proof in the mathematical sense. For the exercises, the challenge is often two-fold. You first have to turn the statement of the exercise into a formal statement. Then you have to prove that statement. For Exercise 4 you will first have to specify the order in which the inverted lists should be sorted. Then you have to prove that the document with the i-th largest score (formed by max aggregation), where i <= k, is indeed among one of the k first entries wrt to the specified order, in at least one of the inverted lists. Hannah 3Nov09 10:29pm
About Exercise4: I actually dont know how to to write down (but i think i know how/why it works) the prove of top-k retrieval with the maximum-score. Is it okay to describe it in words or do we have to formalize it in a certain way? Mirko 5Nov09 22:21pm
Ok, I have played around a bit with lecturnity myself, and published Lecture 3, see the link above. For Marjan it worked, he only needed to install some Windows Media plugin for his Firefox. Please also try, and tell me if there are problems. Also tell me if everything goes fine. (It's enough if one or two people tell me.) If it does I will also publish Lecture 1. Lecture 2, as I said, is lost to the world forever (well, at least the audio), since audio recording did not work that day. Hannah 3Nov09 10:06pm
Dear Marius + all: Yes, the lectures are recorded, except for Lecture 2, where there were technical problems (no signal from the microphone). I always copy the Lecturnity files to my machine after the lecture, but don't know yet how how to publish them on the web so that they are easily viewable by others. I will meet with our group's technician tomorrow, and ask him about this. Stay tuned! Hannah 5Nov09 8:36pm
Hi, I noticed that you record your lectures. Is it somehow possible to download these recordings or will they be released later? Marius Nov5th, 4:54 p.m.
Hi Waleed, when you create a conflict, it's your responsibility to remove it and not leave a mess behind. If the instructions given when the conflict occurs do not suffice, try to find more information on the Wiki help pages. Hannah 3Nov09 9:00pm
I uploaded my Files and put a new row on table in the excercies sheet 2 page but when i pressed save button it shows me conflict. my version and other version of list. how can i remove conflict? does my assignment is submitted properly or not? Waleed 3Nov09