15497
Comment:
|
15683
|
Deletions are marked like this. | Additions are marked like this. |
Line 22: | Line 22: |
'''Inverted indexes and like''': If a inverted index maps a word, w, (perhaps a string) to a subset, W(w), of the set of all documents (perhapse only the IDs as numbers). Is W(w) always sorted? Does it contain duplicats? For some application (and the algorithms for them) this seems to matter. I'm just asking in case of a exam task involving coding. '''Johannes 2010-03-07T13:54''' | '''Inverted indexes and like''': If a inverted index maps a word, w, (perhaps a string) to a subset, W(w), of the set of all documents (perhapse only the IDs as numbers). Is W(w) always sorted? Does it contain duplicats? For some application (and the algorithms for them) this seems to matter. I'm just asking in case of a exam task, involving coding (especially k-way-merge). '''Johannes 2010-03-07T13:54''' '''Elias-Gamma Encoding''' Is there a closed form for the length of the code for an integer x when elias is iterated k times? ''Johannes 2010-03-07T15:14''' |
Welcome to the Wiki page of the course Search Engines, WS 2009 / 2010. Lecturer: Hannah Bast. Tutorials: Marjan Celikik. Course web page: click here.
Here are PDFs of the slides of the lectures: Lecture 1, Lecture 2, Lecture 3, Lecture 4, Lecture 5, Lecture 6, Lecture 7, Lecture 8, Lecture 9, Lecture 10, Lecture 11, Lecture 12, Lecture 13, Lecture 14, Projects.
Here are the recordings of the lectures (except Lecture 2, where we had problems with the microphone), LPD = Lecturnity recording: Recording Lecture 1 (LPD), Recording Lecture 3 (LPD), Recording Lecture 4 (LPD), Recording Lecture 5 (LPD without audio), Recording Lecture 6 (LPD), Recording Lecture 7 (AVI), Recording Lecture 8 (AVI), Recording Lecture 9 (AVI), Recording Lecture 10 (AVI), Recording Lecture 11 (AVI), Recording Lecture 12 (AVI), Recording Lecture 13 (AVI), Recording Lecture 14 (AVI). To play the Lecturnity recordings (.lpd files) you need the Lecturnity Player, which you can download here. I put the Camtasia recordings as .avi files, which you can play with any ordinary video player; I would recommend VLC.
Here are PDFs of the exercise sheets so far: Exercise Sheet 1, Exercise Sheet 2, Exercise Sheet 3, Exercise Sheet 4, Exercise Sheet 5, Exercise Sheet 6, Exercise Sheet 7, Exercise Sheet 8, Exercise Sheet 9, Exercise Sheet 10, Exercise Sheet 11, Exercise Sheet 12, Exercise Sheet 13, Exercise Sheet 14.
Here are your solutions and comments on the previous exercise sheets: Solutions and Comments 1, Solutions and Comments 2, Solutions and Comments 3, Solutions and Comments 4, Solutions and Comments 5, Solutions and Comments 6, Solutions and Comments 7, Solutions and Comments 8, Solutions and Comments 9, Solutions and Comments 10, Solutions and Comments 11, Solutions and Comments 12, Solutions and Comments 13.
Here are our master solutions: Master solution for Mid-Term Exam,Master solution for Exercise Sheet 9, Master solution for Exercise Sheet 10, Master solution for Exercise Sheet 11, Master solution for Exercise Sheet 12.
Here are the rules for the exercises as explained in Lecture 2.
Here is everything about the mid-term exam. The final exam is on Friday March 12, 2010. The written exam begins at 2.00 pm in HS 026. The oral exams are scheduled on the same day.
Here is the table with the links to your uploaded solutions for Exercise Sheet 14. The deadline is Thursday 18Feb10 16:00.
More general questions and comments
Inverted indexes and like: If a inverted index maps a word, w, (perhaps a string) to a subset, W(w), of the set of all documents (perhapse only the IDs as numbers). Is W(w) always sorted? Does it contain duplicats? For some application (and the algorithms for them) this seems to matter. I'm just asking in case of a exam task, involving coding (especially k-way-merge). Johannes 2010-03-07T13:54
Elias-Gamma Encoding Is there a closed form for the length of the code for an integer x when elias is iterated k times? Johannes 2010-03-07T15:14
There was an obvious mistake which I now corrected (00 should be mapped to 1, not 0). Each code stands for two bits at a time, so for a sequence of n bits, you have to generate n/2 codes. I replaced Yes, indeed, I replaced The task asked for the Thanks a lot for your comments! Please go on if you have more. Thanks a lot for your answers!
Hi Johannes: why don't you start with the first few questions, and then let's see whether it makes sense to continue this via the Wiki, or via private email, or via a meeting in person. Yes, the final exam is like the mid-term exam in this respect. Alex: http://vulcano.informatik.uni-freiburg.de/wiki/teaching/SearchEnginesWS0910/MidTermExam, so it seems to be allowed. Hi, I was wondering, will the exam next week also be an open book exam like the mid-term? Perhaps I overlooked it, but I don't think this is stated anywhere yet. I have lots of questions and don't know where to put them. I suppose this wiki-page will get chaotic pretty fast if I post 20 questions. I'm sorry for the delay with the master solutions. I am at a conference right now but will try to make progress with this over the weekend. Do we get master solutions for ex. 11, 12, 13 and 14? Now they're there again. ARGH! I'm very sorry. My Down-Them-All Plugin for Firefox seems to have deleted all the lecture PDFs! Sorry for that. Rollback to previous versions does not seem to work. I hope, someone has already downloaded them all and is able to restore them! SORRY! Interesting, I've got the rights to delete something from the main page, though. (Reminder:) Hello, the master solutions are not online, yet. Yes, we are working on it. Please remind us again if they aren't online by the end of this week. Do we get master solutions for ex. 11, 12, 13 and 14? Hi Matthias, yes, Pr(A) = 1 - Pr(not A), for any event A, and so for any random variable X, Pr(X <= x) = 1 - Pr(X > x), because X <= x and X > x are complementary events. For continuous random variables (like variables with a normal distribution), the difference between <= and < and >= and > is immaterial, because Pr(X = x) for each fixed x. But anyway, to compute the probability, you first have to transform it a bit, like I did in the lecture, and then obtain Pr(N(0,1) >= sqrt(n1) * (µ1 - µ) / σ) and Pr(N(0,1) <= sqrt(n2) * (µ - µ2) / σ). To evaluate the latter you can also simply use the symmetry of the normal distribution, due to which one has Pr(N(0,1) <= -x) = Pr(N(0,1) >= x). Hi, how can we compute Pr(N(n2 * µ2, n2 * σ2) <= n2 * µ2 ? Can we use 1- (Pr(N(n2 * µ2, n2 * σ2) >= n2 * µ2) for that ? Hi Florian + all, one of µ1 and µ2 is larger than µ and one is smaller. Let's assume µ1 is larger and µ2 is smaller. Then for µ1 you have to look at Pr(N(n1 * µ, n1 * σ2) >= n1 * µ1). But for µ2 you have to look at Pr(N(n2 * µ2, n2 * σ2) <= n2 * µ2). Note the <= instead of the >= for the second probability. Recall the meaning of these probabilities. Just as an example, let µ be 100 and µ1 be 150 and µ2 be 50. Then the first probability means: what is the probability that I see a mean of Sorry, with probability for µ1 I meant Pr(N(n1 * µ, n1 * σ2) >= n1 * µ1) and accordingly with probability for µ2 I meant Pr(N(n2 * µ, n2 * σ2) >= n2 * µ2) where n1=n2 for the exercise sheet. Hi Florian, what exactly do you mean by Hi, what values are we expected to get for exercise 4? I always get a probability of about 99.9% for μ1 and a value of about 0.07% for μ2, can that be? Hi Florian, yes, the Hi, I guess we should measure the running times to determine the efficiency of the programs for exercise 3? Hi Claudius, you should compute Pr(D|H0), exactly as done in the lecture for Example 2, where we computed this probability as Pr(X > x), where X is a random variable with distribution N(0,1), that is, normal with mean 0 and variance 1, and x depends on the mean and variance of your data. Hi. If I have understood correctly, we have to compute Pr(H|D) in Exercise 4. From statistical hypothesis testing, we get Pr(D|H). Now, Pr(H|D) = Pr(D|H) * (Pr(H) / Pr(D)). We know Pr(D|H) and we can compute Pr(D), but what value do we have to use for Pr(H)? Hi Eric, I don't care whether you use integers or doubles, but I am curious why the one should be any harder than the other? May we use integers for sorting? Or do we have to use doubles? This is important for generating my sorted array If you're asking about the merging you can of course use a priority queue if you want, but you don't really need it when merging 2 lists. Why would you use a priority queue? It's simple sorting, the exercise is not about implementing your own sorting algorithm or something like that. About exercise 3, it should be clear from the exercise itself that the sequences should be sorted (otherwise how can the merging work?) Means that we have nothing to do than use a priority queue or something like that and don't have to implement the sorting? And at Exercise 3 the random set should be an ordered one or not? We prefer randomized sorting using bitonic networks, alternatively combined with LSD radix sort or simple pancake sort. That's of course a joke, it should be clear that you can use the built-in sorting functions (your own implementation will be certainly slower). What does "do a standard sort" in exercise 2 mean? Shall I implement one on my own, or may I use the Java built-in sorting mechanisms? Also, which sorting algorithm do you prefer for this? Questions and comments about the master solution of the mid-term exam
Johannes 2010-03-07T12:40 : 1.3: CLAIM: If an encoding is prefix-free, then there is no code that is a prefix of a different code. Does this claim hold? If so, then 001 mustn't be a code, since 0 is a code and a prefix of 001. Is this right? Questions and comments about Exercise Sheet 14 below this line (most recent on top)