Diff for "SearchEnginesWS0910"

Differences between revisions 172 and 784 (spanning 612 versions)

Welcome to the Wiki page of the course Search Engines, WS 2009 / 2010. Lecturer: Hannah Bast. Tutorials: Marjan Celikik. Course web page: click here.

Here are PDFs of the slides of the lectures: Lecture 1, Lecture 2, Lecture 3, Lecture 4, Lecture 5, Lecture 6, Lecture 7, Lecture 8, Lecture 9, Lecture 10, Lecture 11, Lecture 12, Lecture 13, Lecture 14, Projects.

Here are PDFs of the exercise sheets so far: Exercise Sheet 1, Exercise Sheet 2, Exercise Sheet 3, Exercise Sheet 4, Exercise Sheet 5, Exercise Sheet 6, Exercise Sheet 7, Exercise Sheet 8, Exercise Sheet 9, Exercise Sheet 10, Exercise Sheet 11, Exercise Sheet 12, Exercise Sheet 13, Exercise Sheet 14.

Here are your solutions and comments on the previous exercise sheets: Solutions and Comments 1, Solutions and Comments 2, Solutions and Comments 3, Solutions and Comments 4, Solutions and Comments 5, Solutions and Comments 6, Solutions and Comments 7, Solutions and Comments 8, Solutions and Comments 9, Solutions and Comments 10, Solutions and Comments 11, Solutions and Comments 12, Solutions and Comments 13.

Here are our master solutions: Master solution for Exercise Sheet 6 (only Exercise 4), Master solution for Mid-Term Exam,Master solution for Exercise Sheet 9, Master solution for Exercise Sheet 10, Master solution for Exercise Sheet 11, Master solution for Exercise Sheet 12.

Here are the rules for the exercises as explained in Lecture 2.

Here is everything about the mid-term exam (December 2009, trial exam which did not count for anything and the final exam (March 12, 2010, the real thing which accounted for most of the mark.

Here is the table with the links to your uploaded solutions for Exercise Sheet 14. The deadline is Thursday 18Feb10 16:00.

More general questions and comments

Yes, HS026 in building 101. Hannah 12Mar10 8:45am

Does the HS mean same 101 building? I am still new in Freiburg ;). Paresh 12 Mar 10 07:45

I am sorry that we have not managed to produce the master solutions for exercise sheets 13 and 14 yet. However, Lecture 13 is still relevant for the exam (clustering, part 2), but I can tell you that there will be no task about Lecture 14 (statistical hypothesis testing). I think that's fair, because there was no tutorial for the (last) exercise sheet 14. Exercise sheet 14 will count as a normal excercise sheet, however. Hannah 11Mar10 15:21

Hi, since there are still no master solutions for sheets 13-14, I assume the contents of the lecture concerning these sheets are not relevant for the exam. Marius Mar11th 2:23 p.m.

To Johannes + all: you are not allowed to bring any computing devices whatsoever and you won't need them. If there is a task which requires a calculation that is unreasonable to do by hand (like the log_2(10/7) from the mid-term exam), we will tell you what it is or an approximation to work with (for example that you can take log_2(10/7) as 0.5). Hannah 10Mar10 20:42

Exam and portable calculators: "2. You are not allowed to use any computing devices, mobile phones, etc." I had some problems with pen, paper and sqrt(1080). May we bring calculators? Johannes 2010-03-10T20:27

Thanks, the solution for sheet6 ex4 helped us a lot! björn

@Jonas: thanks for the comment, I have corrected it in the master solution. @Björn: I added a master solution for Exercise Sheet 6 (only Exercise 4), linked above, with what I think is a very short and simple proof. Tell me if you find anything wrong with it. Hannah 10Mar10 16:40

Jonas: Yes, that was already mentioned in the tutorials. Marjan 10Mar10 15:58

Hi. Concerining exercise sheet 10 exercise 1. Shouldn't you take the squareroots of 108 and 10 (in the Matrix EPSILON). Otherwise the equation is not right. Jonas 10.03.10

Hi, we got a question concerning ex sheet 6, exercise 4. In the tutorial Marjan presented a solid, but complicated solution using Taylor Expansion. In the lecture you mentioned that this wasn't necessary for any exercise. Unfortunately we fail at finding a simpler, but still mathematical rigorous solution. Would you please give a brief idea of how to proove such inequalities as this might by useful for similar, yet easier exercises in the exam. Björn Mi 15:12

Hi Johannes + all. Here is a very simple example: let the query word be algorithm and one candidate similar word computed by the permuted lexicon be algXXXthm (the common prefix is thmalg [from the permutations thmalgori and thmalgXXX] which is long enough) and let the edit distance threshold be 2. Obviously this candidate word will be filtered out because the edit distance is 3. Marjan 07Mar10 18:57

Filtering with a Permutern Index: The slide states: "for all matches thus found, compute the actual edit distance". Is there a simple strawman-example for a word that gets removed in the postfiltering-step? (Today is silly question day.) Johannes 2010-03-07T18:26

Hi Johannes + all. Concerning your inverted index question: it really depends on the application, if you have lists of only doc ids and want to intersect them fast, you would sort the lists by doc id, if you want to do top-k you would sort them by score. Duplicates only make sense when you also store positional information, which we didn't do in the lecture. Concerning your Elias-Gamma question: there is an upper bound, which I think we also derived in the lecture, and that is log n + O(log^(k) n) + O(1), but I couldn't tell you what are the constants hidden in the two Big-Ohs. Hannah 7Mar10 18:19

Inverted indexes and like: If a inverted index maps a word, w, (perhaps a string) to a subset, W(w), of the set of all documents (perhaps only the IDs as numbers). Is W(w) always sorted? Does it contain duplicates? For some application (and the algorithms for them) this seems to matter. I'm just asking in case of a exam task, involving coding (especially k-way-merge). Johannes 2010-03-07T13:54

Elias-Gamma Encoding: Is there a closed form for the length of the code for an integer x when elias is iterated k times? Johannes 2010-03-07T15:14

Questions and comments about the master solution of the mid-term exam

Johannes 2010-03-07T12:40 :

1.3: CLAIM: If an encoding is prefix-free, then there is no code that is a prefix of a different code. Does this claim hold? If so, then 001 mustn't be a code, since 0 is a code and a prefix of 001. Is this right?

There was an obvious mistake which I now corrected (00 should be mapped to 1, not 0). Hannah 7Mar10 12:56

1.4: It states: "For a sequence of length n, we need to generate n/2 such codes [...]." Does not each symbol of the n from the sequence get encoded?

Each code stands for two bits at a time, so for a sequence of n bits, you have to generate n/2 codes. I replaced sequence of length n by sequence of n bits to make this clearer. Hannah 7Mar10 12:58

3.4: The function returns the number of common k-grams (as far as I see). Can the return-line be completed with a call to the function from 3.2 to return the Jaccard-distance?

Yes, indeed, I replaced return l by return jaccardDistance(x, y, k, l). Hannah 7Mar10 13:01

5.4: Does the top-k-algorithm return the top k documents? If so, which k had to been used in this task? What exactly is the condition for stopping? What exactly is the update rule for the ranges? My idea is that (for a fixed document) the minimum is always the known minimum from any of the lists and the maximum is always the (already known) minimum plus the lowest score, seen in any list different than the one the minimum is from. In case of only two lists there may be some simplifications.

The task asked for the top-ranked document, so k = 1. We can stop when the upper bound for all documents not yet seen is strictly below the k-th largest lower bound so far, and when the score ranges for the documents already seen are such that it is clear which are the top-k documents and in which order. If there are ties, and we don't care how they are broken, and we don't care to know the order of the top-k documents, we can sometimes stop earlier. Does this answer all your questions? Hannah 7Mar10 13:06

Thanks a lot for your comments! Please go on if you have more. Hannah 7Mar10 13:07

Thanks a lot for your answers! Johannes 2010-03-07T13:44

Questions and comments about Exercise Sheet 14 below this line (most recent on top)

Hi Johannes: why don't you start with the first few questions, and then let's see whether it makes sense to continue this via the Wiki, or via private email, or via a meeting in person. Hannah 6Mar10 17:36

Yes, the final exam is like the mid-term exam in this respect. Hannah 6Mar10 17:36

Alex: http://vulcano.informatik.uni-freiburg.de/wiki/teaching/SearchEnginesWS0910/MidTermExam, so it seems to be allowed. Mirko, 6Mar10 16:10

Hi, I was wondering, will the exam next week also be an open book exam like the mid-term? Perhaps I overlooked it, but I don't think this is stated anywhere yet. Alex 6Mar10 13:49

I have lots of questions and don't know where to put them. I suppose this wiki-page will get chaotic pretty fast if I post 20 questions. Johannes VI Mar MMX 12:00

I'm sorry for the delay with the master solutions. I am at a conference right now but will try to make progress with this over the weekend. Hannah 4Mar10 23:59

Do we get master solutions for ex. 11, 12, 13 and 14? Johannes 04Mar2010 23:32 ZULU

Now they're there again. Marjan 01Mar18:09

ARGH! I'm very sorry. My Down-Them-All Plugin for Firefox seems to have deleted all the lecture PDFs! Sorry for that. Rollback to previous versions does not seem to work. I hope, someone has already downloaded them all and is able to restore them! SORRY! Interesting, I've got the rights to delete something from the main page, though. Marius Mar 1st 2010 2:38 p.m.

(Reminder:) Hello, the master solutions are not online, yet. alex n 1Mar10 11:08

Yes, we are working on it. Please remind us again if they aren't online by the end of this week. Hannah 23Feb10 14:30

Do we get master solutions for ex. 11, 12, 13 and 14? Johannes 23Feb10 14:05

Hi Matthias, yes, Pr(A) = 1 - Pr(not A), for any event A, and so for any random variable X, Pr(X <= x) = 1 - Pr(X > x), because X <= x and X > x are complementary events. For continuous random variables (like variables with a normal distribution), the difference between <= and < and >= and > is immaterial, because Pr(X = x) for each fixed x. But anyway, to compute the probability, you first have to transform it a bit, like I did in the lecture, and then obtain Pr(N(0,1) >= sqrt(n1) * (µ1 - µ) / σ) and Pr(N(0,1) <= sqrt(n2) * (µ - µ2) / σ). To evaluate the latter you can also simply use the symmetry of the normal distribution, due to which one has Pr(N(0,1) <= -x) = Pr(N(0,1) >= x). Hannah 18Feb10 12:58

Hi, how can we compute Pr(N(n2 * µ2, n2 * σ²) <= n2 * µ2 ? Can we use 1- (Pr(N(n2 * µ2, n2 * σ²) >= n2 * µ2) for that ? Matthias 18Feb10 12:01

Hi Florian + all, one of µ1 and µ2 is larger than µ and one is smaller. Let's assume µ1 is larger and µ2 is smaller. Then for µ1 you have to look at Pr(N(n1 * µ, n1 * σ²) >= n1 * µ1). But for µ2 you have to look at Pr(N(n2 * µ2, n2 * σ²) <= n2 * µ2). Note the <= instead of the >= for the second probability. Recall the meaning of these probabilities. Just as an example, let µ be 100 and µ1 be 150 and µ2 be 50. Then the first probability means: what is the probability that I see a mean of 150 or more in my first sample, although the mean of my distribution is 100. The second probability means: what is the probability that I see a mean of 50 or less in my second sample, although the mean of my distribution is 100. If you take both <= or both >= for both probabilities, it is to be expected that you get two completely different probabilities, one very low and one very high (except when they are both close to 50%). Please ask again if this is still unclear. Hannah 17Feb10 21:51

Sorry, with probability for µ1 I meant Pr(N(n1 * µ, n1 * σ²) >= n1 * µ1) and accordingly with probability for µ2 I meant Pr(N(n2 * µ, n2 * σ²) >= n2 * µ2) where n1=n2 for the exercise sheet. Florian 17Feb10 21:18

Hi Florian, what exactly do you mean by probability for µ1 and probability for µ2? Hannah 17Feb10 21:02

Hi, what values are we expected to get for exercise 4? I always get a probability of about 99.9% for μ1 and a value of about 0.07% for μ2, can that be? Florian 17Feb10 18:25

Hi Florian, yes, the averages in Exercise 3 should be average running times. I uploaded a new version of the sheet, where I corrected this. Hannah 14Feb10 17:48

Hi, I guess we should measure the running times to determine the efficiency of the programs for exercise 3? Florian 15Feb10 17:42

Hi Claudius, you should compute Pr(D|H0), exactly as done in the lecture for Example 2, where we computed this probability as Pr(X > x), where X is a random variable with distribution N(0,1), that is, normal with mean 0 and variance 1, and x depends on the mean and variance of your data. Hannah 14Feb10 16:44

Hi. If I have understood correctly, we have to compute Pr(H|D) in Exercise 4. From statistical hypothesis testing, we get Pr(D|H). Now, Pr(H|D) = Pr(D|H) * (Pr(H) / Pr(D)). We know Pr(D|H) and we can compute Pr(D), but what value do we have to use for Pr(H)? Claudius 14Feb10 14:41

Hi Eric, I don't care whether you use integers or doubles, but I am curious why the one should be any harder than the other? Hannah 12Feb10 19:02

May we use integers for sorting? Or do we have to use doubles? This is important for generating my sorted array Eric 12Feb10 18:56

If you're asking about the merging you can of course use a priority queue if you want, but you don't really need it when merging 2 lists. Marjan 18:28

Why would you use a priority queue? It's simple sorting, the exercise is not about implementing your own sorting algorithm or something like that. About exercise 3, it should be clear from the exercise itself that the sequences should be sorted (otherwise how can the merging work?) Marjan 18:23

Means that we have nothing to do than use a priority queue or something like that and don't have to implement the sorting? And at Exercise 3 the random set should be an ordered one or not? Alex 12Feb10 18:19

We prefer randomized sorting using bitonic networks, alternatively combined with LSD radix sort or simple pancake sort. That's of course a joke, it should be clear that you can use the built-in sorting functions (your own implementation will be certainly slower). Marjan 12Feb10 18:12

What does "do a standard sort" in exercise 2 mean? Shall I implement one on my own, or may I use the Java built-in sorting mechanisms? Also, which sorting algorithm do you prefer for this? Eric 12Feb10 18:04

-  ⇤ ← Revision 172 as of 2009-11-02 23:07:09 → 
  Size: 17496
  Editor: Hannah Bast
  Comment:
+   ← Revision 784 as of 2010-03-12 17:17:54 → ⇥
  Size: 19679
  Editor: Hannah Bast
  Comment:
-Deletions are marked like this.
+Additions are marked like this.
 Line 3:
-Here are PDFs of the slides of the lectures so far: [[attachment:SearchEnginesWS0910/lecture-1.pdf|Lecture 1]], [[attachment:SearchEnginesWS0910/lecture-2.pdf|Lecture 2]].
+Here are PDFs of the slides of the lectures: [[attachment:SearchEnginesWS0910/lecture-1.pdf|Lecture 1]], [[attachment:SearchEnginesWS0910/lecture-2.pdf|Lecture 2]], [[attachment:SearchEnginesWS0910/lecture-3.pdf|Lecture 3]], [[attachment:SearchEnginesWS0910/lecture-4.pdf|Lecture 4]], [[attachment:SearchEnginesWS0910/lecture-5.pdf|Lecture 5]], [[attachment:SearchEnginesWS0910/lecture-6.pdf|Lecture 6]], [[attachment:SearchEnginesWS0910/lecture-7.pdf|Lecture 7]], [[attachment:SearchEnginesWS0910/lecture-8.pdf|Lecture 8]], [[attachment:SearchEnginesWS0910/lecture-9.pdf|Lecture 9]], [[attachment:SearchEnginesWS0910/lecture-10.pdf|Lecture 10]], [[attachment:SearchEnginesWS0910/lecture-11.pdf|Lecture 11]], [[attachment:SearchEnginesWS0910/lecture-12.pdf|Lecture 12]], [[attachment:SearchEnginesWS0910/lecture-13.pdf|Lecture 13]], [[attachment:SearchEnginesWS0910/lecture-14.pdf|Lecture 14]], [[attachment:SearchEnginesWS0910/lecture-projects.pdf|Projects]].
 Line 5:
-Here are PDFs of the exercise sheets so far: [[attachment:SearchEnginesWS0910/exercise-1.pdf|Exercise Sheet 1]], [[attachment:SearchEnginesWS0910/exercise-2.pdf|Exercise Sheet 2]].
+Here are the recordings of the lectures (except Lecture 2, where we had problems with the microphone), LPD = Lecturnity recording: [[http://vulcano.informatik.uni-freiburg.de/lecturnity/lecture-1.lpd|Recording Lecture 1 (LPD)]], [[http://vulcano.informatik.uni-freiburg.de/lecturnity/lecture-3.lpd|Recording Lecture 3 (LPD)]], [[http://vulcano.informatik.uni-freiburg.de/lecturnity/lecture-4.lpd|Recording Lecture 4 (LPD)]], [[http://vulcano.informatik.uni-freiburg.de/lecturnity/lecture-5.lpd|Recording Lecture 5 (LPD without audio)]], [[http://vulcano.informatik.uni-freiburg.de/lecturnity/lecture-6.lpd|Recording Lecture 6 (LPD)]], [[http://vulcano.informatik.uni-freiburg.de/lecturnity/lecture-7.avi|Recording Lecture 7 (AVI)]], [[http://vulcano.informatik.uni-freiburg.de/lecturnity/lecture-8.avi|Recording Lecture 8 (AVI)]], [[http://vulcano.informatik.uni-freiburg.de/lecturnity/lecture-9.avi|Recording Lecture 9 (AVI)]], [[http://vulcano.informatik.uni-freiburg.de/lecturnity/lecture-10.avi|Recording Lecture 10 (AVI)]], [[http://vulcano.informatik.uni-freiburg.de/lecturnity/lecture-11.avi|Recording Lecture 11 (AVI)]], [[http://vulcano.informatik.uni-freiburg.de/lecturnity/lecture-12.avi|Recording Lecture 12 (AVI)]], [[http://vulcano.informatik.uni-freiburg.de/lecturnity/lecture-13.avi|Recording Lecture 13 (AVI)]], [[http://vulcano.informatik.uni-freiburg.de/lecturnity/lecture-14.avi|Recording Lecture 14 (AVI)]]. To play the Lecturnity recordings (.lpd files) you need the [[http://www.lecturnity.de/de/download/lecturnity-player|Lecturnity Player, which you can download here]]. I put the Camtasia recordings as .avi files, which you can play with any ordinary video player; I would recommend [[http://www.videolan.org/vlc|VLC]].
 Line 7:
-Here are your solutions and comments on the previous exercise sheets: [[SearchEnginesWS0910/ExerciseSheet1|Exercise Sheet 1]].
+Here are PDFs of the exercise sheets so far: [[attachment:SearchEnginesWS0910/exercise-1.pdf|Exercise Sheet 1]], [[attachment:SearchEnginesWS0910/exercise-2.pdf|Exercise Sheet 2]], [[attachment:SearchEnginesWS0910/exercise-3.pdf|Exercise Sheet 3]], [[attachment:SearchEnginesWS0910/exercise-4.pdf|Exercise Sheet 4]], [[attachment:SearchEnginesWS0910/exercise-5.pdf|Exercise Sheet 5]], [[attachment:SearchEnginesWS0910/exercise-6.pdf|Exercise Sheet 6]], [[attachment:SearchEnginesWS0910/exercise-7.pdf|Exercise Sheet 7]], [[attachment:SearchEnginesWS0910/exercise-8.pdf|Exercise Sheet 8]], [[attachment:SearchEnginesWS0910/exercise-9.pdf|Exercise Sheet 9]], [[attachment:SearchEnginesWS0910/exercise-10.pdf|Exercise Sheet 10]], [[attachment:SearchEnginesWS0910/exercise-11.pdf|Exercise Sheet 11]], [[attachment:SearchEnginesWS0910/exercise-12.pdf|Exercise Sheet 12]], [[attachment:SearchEnginesWS0910/exercise-13.pdf|Exercise Sheet 13]], [[attachment:SearchEnginesWS0910/exercise-14.pdf|Exercise Sheet 14]].

Here are your solutions and comments on the previous exercise sheets: [[SearchEnginesWS0910/ExerciseSheet1|Solutions and Comments 1]], [[SearchEnginesWS0910/ExerciseSheet2|Solutions and Comments 2]], [[SearchEnginesWS0910/ExerciseSheet3|Solutions and Comments 3]], [[SearchEnginesWS0910/ExerciseSheet4|Solutions and Comments 4]], [[SearchEnginesWS0910/ExerciseSheet5|Solutions and Comments 5]], [[SearchEnginesWS0910/ExerciseSheet6|Solutions and Comments 6]], [[SearchEnginesWS0910/ExerciseSheet7|Solutions and Comments 7]], [[SearchEnginesWS0910/ExerciseSheet8|Solutions and Comments 8]], [[SearchEnginesWS0910/ExerciseSheet9|Solutions and Comments 9]], [[SearchEnginesWS0910/ExerciseSheet10|Solutions and Comments 10]], [[SearchEnginesWS0910/ExerciseSheet11|Solutions and Comments 11]], [[SearchEnginesWS0910/ExerciseSheet12|Solutions and Comments 12]], [[SearchEnginesWS0910/ExerciseSheet13|Solutions and Comments 13]].

Here are our master solutions: [[attachment:SearchEnginesWS0910/solution-6.pdf|Master solution for Exercise Sheet 6 (only Exercise 4)]], [[attachment:SearchEnginesWS0910/solution-midterm.pdf|Master solution for Mid-Term Exam]],[[attachment:SearchEnginesWS0910/solution-9.pdf|Master solution for Exercise Sheet 9]], [[attachment:SearchEnginesWS0910/solution-10.pdf|Master solution for Exercise Sheet 10]], [[attachment:SearchEnginesWS0910/solution-11.pdf|Master solution for Exercise Sheet 11]], [[attachment:SearchEnginesWS0910/solution-12.pdf|Master solution for Exercise Sheet 12]].
-Line 11:
+Line 15:
-= Exercise Sheet 2 =
Here are the details about the three servers (UDP, TCP, HTTP) for Exercise 4:
+Here is everything about the [[SearchEnginesWS0910/MidTermExam|mid-term exam (December 2009, trial exam which did not count for anything]] and the [[SearchEnginesWS0910/FinalExam|final exam (March 12, 2010, the real thing which accounted for most of the mark]].
-Line 14:
+Line 17:
-All three servers are running on our machine vulcano.informatik.uni-freiburg.de (IP address is 132.230.152.135).

The UDP server is running on port 8888 of that machine. You can send it a number and it will then send you back that number of bytes, in packets of 1000 bytes each. (That means you also have to read packets of 1000 bytes each.) The first ten bytes of each packet contain the packet id. That is interesting for checking which packets get lost and in which order packets arrive.

The TCP server is running on port 9999 of that machine. You can send it a request of the form GET /<number of bytes> HTTP/1.1, or you can just use a downloading program like wget or curl and time it.

The HTTP server is running on port 80, as web servers normally do. Just download the file http://vulcano.informatik.uni-freiburg.de/file_100M and measure the time. You can assume that no data gets lost.

For measuring your transfer and error rates, as requested by the exercise, repeat your experiments several times and also at different times, and form the average of these measurements (or report several numbers if you get very different results). You should ask for large amounts of data, like 10 MB or 100 MB.

[[SearchEnginesWS0910/ExerciseSheet2|Here you can upload your solutions for Exercise Sheet 2]].

== Questions or comments below this line, most recent on top please ==

To Jonas again: I checked from one of my machines in Saarbrücken, that port 8888 and 9999 are reachable from there. I checked that already several days ago. So yeah, it looks like a problem of your network, though very strange that port 8888 works but port 8889 doesn't. '''Hannah 2Nov09 11:06pm'''

To Jonas: This is not a problem, just say a port that will work for you and we will start another TCP server on this port. '''Marjan 02Nov09 10:04'''

Good evening. After hours of trying to debug my TCP client I am now sure that I can't use port 9999. UDP works fine if I establish a vpn channel. If not, neither TCP nor UDP may connect to the server. Inside the university network both versions of the client and the curl test went well. Probably the 9999 port is blocked inside this annoying Wohnheim-network. '''Jonas 2Nov09 9:50pm'''
+[[SearchEnginesWS0910/ExerciseSheet14|Here is the table with the links to your uploaded solutions for Exercise Sheet 14]]. The deadline is Thursday 18Feb10 16:00.
-Line 35:
+Line 20:
-Hi Christian + all: yes, you are right, it would have been nicer to provide a multi-threaded server and we should have announced that our servers are not multi-threaded, sorry for that. About the null-termination: I wasn't aware of that myself. But I actually believe that it is an important part of the exercise to get your feet dirty. Such stupid details are part of the game. I am still doing a lot of implementation work myself and even after 30 years of developing software I still spend a lot of time on such stupid details. But hey, that is how you actually learn stuff. The knowledge of all these details is a big part of what eventually makes the difference between a master and an apprentice. This is not a bad excuse, I really mean it. And, hey, we have the Wiki for talking about this stuff, asking questions, sharing something we have found out with the others, this is also an important part of the learning experience. '''Hannah 2Nov09 8:02pm'''
+== More general questions and comments ==
-Line 37:
+Line 22:
-Hi Björn + all: yes, transfer rate is amount of data received divided by time. '''Hannah 2Nov09 7:58pm'''
+Yes, HS026 in building 101. '''Hannah 12Mar10 8:45am'''
-Line 39:
+Line 24:
-To lecture team: No offense intended, but knowing that the servers are not multi-threaded would have been nice to know in the first place, just like the fact that null-termination is required for the UDP server. I would imagine that a lot of people would have had less frustration with testing and debugging their code. I realize, you had your hands full and writing a server is not a trivial task, and maybe a non-asychronous server may have been intentional, but the next time you might want to check out the asnychat and asyncore modules of Python: 
http://docs.python.org/library/asynchat.html '''Christian 2Nov09 7:49p.m.'''
+Does the HS mean same 101 building? I am still new in Freiburg ;). '''Paresh 12 Mar 10 07:45'''
-Line 42:
+Line 26:
-Hey, how is the "transfer rate" defined in the case of UDP? Should we only count packets that have actually been recieved?'''Björn 2Nov09 7:42pm'''
+I am sorry that we have not managed to produce the master solutions for exercise sheets 13 and 14 yet. However, Lecture 13 is still relevant for the exam (clustering, part 2), but I can tell you that there will be ''no'' task about Lecture 14 (statistical hypothesis testing). I think that's fair, because there was no tutorial for the (last) exercise sheet 14. Exercise sheet 14 will count as a normal excercise sheet, however. '''Hannah 11Mar10 15:21'''
-Line 44:
+Line 28:
-To Markus: Note that the server is not able to handle multiple connections in the same time (it's not multi-threaded), so if somebody else is using the server that means in that very moment the server won't be available for you. '''Marjan 2Nov09 5:44pm'''
+Hi, since there are still no master solutions for sheets 13-14, I assume the contents of the lecture concerning these sheets are not relevant for the exam. '''Marius Mar11th 2:23 p.m.'''
-Line 46:
+Line 30:
-Hi. I have some problems with the TCP server, wget got for some hours 26 Bytes but nothing else, but with the URL from below wget does nothing, it waits. In my own program I get a connection but no data, it waits too. But good news, the UDP works for me. '''Markus 2Nov09 5:42pm'''
+To Johannes + all: you are not allowed to bring any computing devices whatsoever and you won't need them. If there is a task which requires a calculation that is unreasonable to do by hand (like the log_2(10/7) from the mid-term exam), we will tell you what it is or an approximation to work with (for example that you can take log_2(10/7) as 0.5). '''Hannah 10Mar10 20:42'''
-Line 48:
+Line 32:
-No, the UDP server is not down. The way it is now written, it simply restarts itself after it crashes, but it hasn't even crashed once in the last two days. Connection refused does not mean that the server is down, telnet is just an incompatible protocol. '''Hannah 2Nov09 5:16pm'''
+'''Exam and portable calculators''': "2. You are not allowed to use any computing devices, mobile phones, etc." I had some problems with pen, paper and sqrt(1080). May we bring calculators? '''Johannes 2010-03-10T20:27'''
-Line 50:
+Line 34:
-Is the UDP-server actually down? When I try to telnet it, I get a response saying:
$ telnet vulcano.informatik.uni-freiburg.de 8888
Trying 132.230.152.135...
telnet: connect to address 132.230.152.135: Connection refused.
TCP seems to work although I don't get any packet via telnet.
'''Marius Nov 2nd, 5:00 p.m.'''
+Thanks, the solution for sheet6 ex4 helped us a lot! '''björn'''
-Line 57:
+Line 36:
-Hi Björn, just numbers in ASCII. For example, you can ask the TCP server via ''curl http://vulcano.informatik.uni-freiburg.de:9999/150'' (or just type that URL into your browser), which will effectively send the string ''GET /150 HTTP/1.1 ...'' and you will get 150 bytes in return. To the UDP server you just send the number in ASCII, for example ''150'', but make sure that you null-terminate your string, that is, it should have a zero-byte at the end (as C strings naturally have). '''Hannah 2Nov09 4:17pm'''
+@Jonas: thanks for the comment, I have corrected it in the master solution. @Björn: I added a master solution for Exercise Sheet 6 (only Exercise 4), linked above, with what I think is a very short and simple proof. Tell me if you find anything wrong with it. '''Hannah 10Mar10 16:40'''
-Line 59:
+Line 38:
-What kind of "numbers" do the servers from exercise 4 expect? UTF-8 encoded Strings? Byte values? Anything else? Exercises 1-3 were fun to do but I'm completely stuck at ex4 right now. I constantly fail to get any proper response. '''Björn 2Nov09 4:13pm'''
+Jonas: Yes, that was already mentioned in the tutorials. '''Marjan 10Mar10 15:58'''
-Line 61:
+Line 40:
-One more comment: if possible, for Exercise 4, run your clients from a machine ''outside'' of the university. That way you get more interesting results, in particular, you should then see a very marked difference between UDP and TCP, whereas within the Uni or even Informatik network that difference might be very small. '''Hannah 2Nov09 2:37pm'''
+Hi. Concerining exercise sheet 10 exercise 1. Shouldn't you take the squareroots of 108 and 10 (in the Matrix EPSILON). Otherwise the equation is not right. '''Jonas 10.03.10'''
-Line 63:
+Line 42:
-Dear Daniel + all: if you have one binary that can do it all, that is fine. If you have three separate binaries that is also fine. Just make sure to avoid code duplication, that is, if you have three separate binaries (which you, Daniel, have not), make sure that the common code is on commonly used classes and not just copied and pasted. Copying and pasting code is the ultimate evil, believe me. If you have one binary, make sure that the code is well modularized with the various functionalities in appropriately chosen and named classed and methods, and that not all the code is in one big main function or in a single function named solve_exercise_2 or things like that. If this does not fully answer your question, don't hesitate to ask again. '''Hannah 2Nov09 2:33pm'''
+Hi, we got a question concerning ex sheet 6, exercise 4. In the tutorial Marjan presented a solid, but complicated solution using Taylor Expansion. In the lecture you mentioned that this wasn't necessary for any exercise. Unfortunately we fail at finding a simpler, but still mathematical rigorous solution. Would you please give a brief idea of how to proove such inequalities as this might by useful for similar, yet easier exercises in the exam. '''Björn Mi 15:12'''
-Line 65:
+Line 44:
-Hi, sorry to ask again about code writing, but I'm not sure I got it right. Do we have to submit 3 binaries for the exercises? I have written a web server with 3 available URLs (sentence/search/index), which provides an interface for repeating a sentence, searching for keywords with GET parameters and showing a search form to enter keywords respectively. For the last exercise I have standalone code for the clients then. Is this also ok? '''Daniel 2Nov09 2:19pm'''
+Hi Johannes + all. Here is a very simple example: let the query word be ''algorithm'' and one candidate similar word computed by the permuted lexicon be ''algXXXthm'' (the common prefix is ''thmalg'' [from the permutations ''thmalgori'' and ''thmalgXXX''] which is long enough) and let the edit distance threshold be 2. Obviously this candidate word will be filtered out because the edit distance is 3. '''Marjan 07Mar10 18:57'''
-Line 67:
+Line 46:
-Ok, sorry, I just see that I indeed gave contradicting information. Above I wrote, for the TCP server, "... or you can just use a downloading program like wget or curl". So, yes, feel free to just use wget or curl to ask the TCP server, and in that case you can assume that no data gets lost. It would be great though (and not much additional) work, if your client can talk with both the UDP and the TCP server. That way you really make the experience that TCP never drops a packet, while for UDP this is a frequent event. '''Hannah 1Nov09 6:49pm'''
+'''Filtering with a Permutern Index''': The slide states: "for all matches thus found, compute the actual edit distance". Is there a simple strawman-example for a word that gets removed in the postfiltering-step? (Today is silly question day.) '''Johannes 2010-03-07T18:26'''
-Line 69:
+Line 48:
-Hi Mirko + all. All I said is that you can *test* the TCP server via wget or curl. For the exercise you should implement your own client, but since that is very similar to the UDP client, that is not much additional work. For the HTTP part of the exercise, you can, if you want, indeed just use wget or curl and assume that the number of lost packets is zero. '''Hannah 1Nov09 6:35pm'''
+Hi Johannes + all. Concerning your inverted index question: it really depends on the application, if you have lists of only doc ids and want to intersect them fast, you would sort the lists by doc id, if you want to do top-k you would sort them by score. Duplicates only make sense when you also store positional information, which we didn't do in the lecture. Concerning your Elias-Gamma question: there is an upper bound, which I think we also derived in the lecture, and that is log n + O(log^(k) n) + O(1), but I couldn't tell you what are the constants hidden in the two Big-Ohs. '''Hannah 7Mar10 18:19'''
-Line 71:
+Line 50:
-Hi, I am confused, the exercise-sheet says write a client which can communicate over TCP and UDP and for comparison query the HTTP server via wget/curl. Here you are saying we can download the TCP-part via wget/curl. Therefore i wanted to download the files over TCP via wget, but i couldn't find a way to measure the amount of lost packets, does anyone know? '''Mirko 1Nov09 17:23pm'''
+'''Inverted indexes and like''': If a inverted index maps a word, w, (perhaps a string) to a subset, W(w), of the set of all documents (perhaps only the IDs as numbers). Is W(w) always sorted? Does it contain duplicates? For some application (and the algorithms for them) this seems to matter. I'm just asking in case of a exam task, involving coding (especially k-way-merge). '''Johannes 2010-03-07T13:54'''
-Line 73:
+Line 52:
-To all: Now both servers should run fine. '''Marjan 1Nov09 17:03pm'''
+'''Elias-Gamma Encoding''': Is there a closed form for the length of the code for an integer x when elias is iterated k times? '''Johannes 2010-03-07T15:14'''
-Line 75:
+Line 54:
-To Matthias: Unlike the TCP server, the UDP server is still working fine, I just checked. '''Marjan 1Nov09 16:17pm'''
+== Questions and comments about the master solution of the mid-term exam ==
-Line 77:
+Line 56:
-Hi, would you please check the UDP as well? It looks like it isn't returning any data now altough the same implementation worked 2 hours ago. ''' Matthias 1Nov09 3:58pm'''
+'''Johannes 2010-03-07T12:40''' :
-Line 79:
+Line 58:
-Preliminary fix for the TCP server problem: the TCP server is now automatically restarted as soon as it crashed. So you should be able to work with it properly now. '''Hannah 1Nov09 2:54pm'''
+'''1.3''': CLAIM: If an encoding is prefix-free, then there is no code that is a prefix of a different code. Does this claim hold? If so, then 001 mustn't be a code, since 0 is a code and a prefix of 001. Is this right?
-Line 81:
+Line 60:
-Dear all, the TCP is currently crashing whenever the client aborts, and then it's down before we restart it. Marjan is working on solving this problem, and we will tell you as soon as it's done. The UDP server does not have this problem. '''Hannah 1Nov09 2:33pm'''
+There was an obvious mistake which I now corrected (00 should be mapped to 1, not 0). '''Hannah 7Mar10 12:56'''
-Line 83:
+Line 62:
-Hi Zhongjie + all, as it says above "The first ten bytes of each packet contain the packet id ...". (But it only does that if a packet is larger than 10 bytes.) For example, if you ask for 10000 bytes, the server will send you 10 packets with 1000 bytes each, with ids from 0 to 9. This is interesting information, because your client can use it to print the packet id of each package it receives and see how many packets arrive out of order. You don't have to do this for the exercise, but it's interesting and easy to do. And as you will see then, out of order arrival indeed happens. '''Hannah 1Nov09 2:07pm'''
+'''1.4''': It states: "For a sequence of length n, we need to generate n/2 such codes [...]." Does not each symbol of the n from the sequence get encoded?
-Line 85:
+Line 64:
-Well, problem solved... You need to send a package end with a '\0' char to server, otherwise server will not respond... But here is another problem: when I send a UDP package like "5\0" to server, I will receive reply package like "xxxxx". If I send "10\0", the reply is "xxxxxxxxxx". And if it is "20\0" I send, it is "0000000000\0xxxxxxxxx" I receive. Confused... '''Zhongjie 1Nov09 11:20am'''
+Each code stands for two bits at a time, so for a sequence of n bits, you have to generate n/2 codes. I replaced ''sequence of length n'' by ''sequence of n bits'' to make this clearer. '''Hannah 7Mar10 12:58'''
-Line 87:
+Line 66:
-I had the same problem. But try to send your query with two linefeeds at the end, like this:  send_data = '50\n\n' This makes the UDP server a lot more responsive... '''Christian 1Nov09 10:26am'''
+'''3.4''': The function returns the number of common k-grams (as far as I see). Can the return-line be completed with a call to the function from 3.2 to return the Jaccard-distance?
 
Yes, indeed, I replaced ''return l'' by ''return jaccardDistance(x, y, k, l)''. '''Hannah 7Mar10 13:01'''
-Line 89:
+Line 70:
-hello Marjan, Now its work, thanks. what do you mean about skipping one ex.sheet without loosing any points? I wonder whether my exercise uploaded on 26 of Oct is still counted? '''Triatmoko 1Nov09 10:19'''
+'''5.4''': Does the top-k-algorithm return the top k documents? If so, which k had to been used in this task? What exactly is the condition for stopping? What exactly is the update rule for the ranges? My idea is that (for a fixed document) the minimum is always the known minimum from any of the lists and the maximum is always the (already known) minimum plus the lowest score, seen in any list different than the one the minimum is from. In case of only two lists there may be some simplifications.
-Line 91:
+Line 72:
-Hello! I still could not get any response from both the UDP and the TCP server port by now, but HTTP server works fine. If anyone could get some result, please tell me that you can get response from servers, so that I will know it's my own problem... Thank you! '''Zhongjie 1Nov09 08:43'''
+The task asked for the ''top-ranked document'', so k = 1. We can stop when the upper bound for all documents not yet seen is ''strictly'' below the k-th largest lower bound so far, and when the score ranges for the documents already seen are such that it is clear which are the top-k documents and in which order. If there are ties, and we don't care how they are broken, and we don't care to know the order of the top-k documents, we can sometimes stop earlier. Does this answer all your questions? '''Hannah 7Mar10 13:06'''
-Line 93:
+Line 74:
-Hi, I need some clarification on this term "HTTP result header" in Excercise 2 in sheet 2. Will HTTP header contains generic http information or something related to Results? Offcourse our Result will in HTML form. '''Waleed 1Nov09 5:52AM'''
+Thanks a lot for your comments! Please go on if you have more. '''Hannah 7Mar10 13:07'''
-Line 95:
+Line 76:
-I am sorry for the downtime, these maintenance works were announced already several weeks ago, but then I forgot about them because they were scheduled on a Saturday which I thought would not affect me. The downtime also killed our servers, but now they are running again. About the corrections: of course you should get comments on what you did wrong and why you got less points for what. Sorry, if that didn't happen for the first exercise sheet. I will talk with Marjan. '''Hannah 1Nov09 00:38am'''
+Thanks a lot for your answers! '''Johannes 2010-03-07T13:44'''
-Line 97:
+Line 78:
-Yes, Eric. Today afternoon and evening was nearly the whole Uni-Net Offline because of intended maintenance. I have a question: Will there be any correction or comments or a sample solution or something like that for every exercise because I now only know how many points I have for every Exercise in Exercise Sheet 1 but I don't know why I have a lack of 1/2 point in one exercise and in another. I think it would be good if anybody will know what he has done wrong or/and what he could have done better. '''Waldemar 31Oct09 21:08'''
+== Questions and comments about Exercise Sheet 14 below this line (most recent on top) ==
-Line 99:
+Line 80:
-I cannot connect to vulcano.informatik.uni-freiburg.de:9999 and :8888, neither from outside, nor from inside (logged into pool account). Also the whole informatik.uni... was down a few minutes ago. It's hard to solve exercise 4 without the servers running. '''Eric 31Oct09 19:00'''
+Hi Johannes: why don't you start with the first few questions, and then let's see whether it makes sense to continue this via the Wiki, or via private email, or via a meeting in person. '''Hannah 6Mar10 17:36'''
-Line 101:
+Line 82:
-Hi, I wonder whether those three servers for exercise 4 of sheet 2 online or not? '''Zhongjie 31Oct09 11:49'''
+Yes, the final exam is like the mid-term exam in this respect. '''Hannah 6Mar10 17:36'''
-Line 103:
+Line 84:
-To all: Please note that (almost) everybody will get +2 points for Exercise Sheet 1 (previously I did not assign any points to the first and the last problem). '''Marjan 31Oct09 10:03'''
+Alex: http://vulcano.informatik.uni-freiburg.de/wiki/teaching/SearchEnginesWS0910/MidTermExam, so it seems to be allowed. '''Mirko, 6Mar10 16:10'''
-Line 105:
+Line 86:
-To Triatmoko and Ahmed + all: You obviously haven't created your wiki page. Please go to your link and click "Create a new page" and then "Save changes". You should then upload your solutions there and put the link on the wiki like everybody else did. If it is still not clear please ask some of your fellow students. I don't remember if it is mentioned, but you can skip one Exercise Sheet without losing any points. '''Marjan 31Oct09 09:38'''
+Hi, I was wondering, will the exam next week also be an open book exam like the mid-term? Perhaps I overlooked it, but I don't think this is stated anywhere yet. '''Alex 6Mar10 13:49'''
-Line 107:
+Line 88:
-Hi, I have some question about my exercise Sheet 1, I saw in my exercise page that my name, my upload solution and code in gray color and other persons in blue color. I try to click my attachment file in my exercise page, and I have some message that there are no attachment. any body know about this issue?because my exercise uploaded on Monday 26oct '''Triatmoko 30Oct09 22:18'''
+I have lots of questions and don't know where to put them. I suppose this wiki-page will get chaotic pretty fast if I post 20 questions. '''Johannes VI Mar MMX 12:00'''
-Line 109:
+Line 90:
-Hi Björn + all: For Exercise 4 from Exercise Sheet 1 you had to write code that is at least able to process 2-word queries. If your code can indeed only handle 2-word queries and not an arbitrary number of query words, that is also fine for this exercise, you won't get less points because of that. Your second question is also very valid. You should put the various functionalities into modules / classes of their own, so that you can easily combine them for the three different binaries required for Exercises 1 - 3. Each of your three programs will then be quite short, just putting together the right things. I should have added it to my list of evil coding NoNos: never ever duplicate code, but instead put it in a class / module of its own. I hope this answers your questions, if not please ask again. Sorry for the late answer, but I was super busy until now, hardly had time to breathe. '''Hannah 30Oct09 19:08'''
+I'm sorry for the delay with the master solutions. I am at a conference right now but will try to make progress with this over the weekend. '''Hannah 4Mar10 23:59'''
-Line 111:
+Line 92:
-I have a question concerning exercise 2. There was no concrete task to produce "query processing code" on ex sheet 1. Are there any requirements that have to be fulfilled? Should it be able to handle two word queries? n-word queries? Additionally there is something else I want to ask: I think it surely isn't bad practice to write a more generic webserver and use it for exercises 1-3. Apart from that it says "change your code" some times in the exercises. How should your submission behave w.r.t. the exercises? Different src files / executables for each exercise? One program that solve each exercises depending on startup parameters? Anything else? '''Björn 30ct09 2:25pm'''
+Do we get master solutions for ex. 11, 12, 13 and 14? '''Johannes 04Mar2010 23:32 ZULU'''
-Line 113:
+Line 94:
-I now reorganized the page. Old stuff went to separate pages (links above). The idea is that the front page is always for the current lecture / exercises. The problem with your exercise page should be solved now, Ivo. '''Hannah 30ct09 00:05am'''
+Now they're there again. '''Marjan 01Mar18:09'''
-Line 115:
+Line 96:
-Having problems to access my exercise page after loging in - IvoChichkovExercises, '''Ivo 29Oct 22:56pm'''
+ARGH! I'm very sorry. My Down-Them-All Plugin for Firefox seems to have deleted all the lecture PDFs! Sorry for that. Rollback to previous versions does not seem to work. I hope, someone has already downloaded them all and is able to restore them! SORRY! Interesting, I've got the rights to delete something from the main page, though. '''Marius Mar 1st 2010 2:38 p.m.'''
-Line 117:
+Line 98:
-Sorry to bother you. I added the Link to exercise sheet 2 with the linked pdf. Needed this to find the sheet as fast as possible. '''Marius 29Oct 10:04 p.m.'''
+(Reminder:) Hello, the master solutions are not online, yet. '''alex n 1Mar10 11:08'''
-Line 119:
+Line 100:
-Is there a webpage for exercise sheet 2 somewhere? '''Johannes 29Oct 07:45 pm'''
+Yes, we are working on it. Please remind us again if they aren't online by the end of this week. '''Hannah 23Feb10 14:30'''

Do we get master solutions for ex. 11, 12, 13 and 14? '''Johannes 23Feb10 14:05'''

Hi Matthias, yes, Pr(A) = 1 - Pr(not A), for any event A, and so for any random variable X, Pr(X <= x) = 1 - Pr(X > x), because X <= x and X > x are complementary events. For continuous random variables (like variables with a normal distribution), the difference between <= and < and >= and > is immaterial, because Pr(X = x) for each fixed x. But anyway, to compute the probability, you first have to transform it a bit, like I did in the lecture, and then obtain Pr(N(0,1) >= sqrt(n1) * (µ1 - µ) / σ) and Pr(N(0,1) <= sqrt(n2) * (µ - µ2) / σ). To evaluate the latter you can also simply use the symmetry of the normal distribution, due to which one has Pr(N(0,1) <= -x) = Pr(N(0,1) >= x). '''Hannah 18Feb10 12:58'''

Hi, how can we compute Pr(N(n2 * µ2, n2 * σ^2^) <= n2 * µ2 ? Can we use 1- (Pr(N(n2 * µ2, n2 * σ^2^) >= n2 * µ2) for that ? '''Matthias 18Feb10 12:01'''

Hi Florian + all, one of µ1 and µ2 is larger than µ and one is smaller. Let's assume µ1 is larger and µ2 is smaller. Then for µ1 you have to look at Pr(N(n1 * µ, n1 * σ^2^) >= n1 * µ1). But for µ2 you have to look at Pr(N(n2 * µ2, n2 * σ^2^) <= n2 * µ2). Note the <= instead of the >= for the second probability. Recall the meaning of these probabilities. Just as an example, let µ be 100 and µ1 be 150 and µ2 be 50. Then the first probability means: what is the probability that I see a mean of ''150 or more'' in my first sample, although the mean of my distribution is 100. The second probability means: what is the probability that I see a mean of ''50 or less'' in my second sample, although the mean of my distribution is 100. If you take both <= or both >= for both probabilities, it is to be expected that you get two completely different probabilities, one very low and one very high (except when they are both close to 50%). Please ask again if this is still unclear. '''Hannah 17Feb10 21:51'''

Sorry, with probability for µ1 I meant Pr(N(n1 * µ, n1 * σ^2^) >= n1 * µ1) and accordingly with probability for µ2 I meant Pr(N(n2 * µ, n2 * σ^2^) >= n2 * µ2) where n1=n2 for the exercise sheet.  '''Florian 17Feb10 21:18'''

Hi Florian, what exactly do you mean by ''probability for µ1'' and ''probability for µ2''? '''Hannah 17Feb10 21:02'''

Hi, what values are we expected to get for exercise 4? I always get a probability of about 99.9% for μ1 and a value of about 0.07% for μ2, can that be? '''Florian 17Feb10 18:25'''

Hi Florian, yes, the ''averages'' in Exercise 3 should be ''average running times''. I uploaded a new version of the sheet, where I corrected this. '''Hannah 14Feb10 17:48'''

Hi, I guess we should measure the running times to determine the efficiency of the programs for exercise 3? '''Florian 15Feb10 17:42'''

Hi Claudius, you should compute Pr(D|H0), exactly as done in the lecture for Example 2, where we computed this probability as Pr(X > x), where X is a random variable with distribution N(0,1), that is, normal with mean 0 and variance 1, and x depends on the mean and variance of your data. '''Hannah 14Feb10 16:44'''

Hi. If I have understood correctly, we have to compute Pr(H|D) in Exercise 4. From statistical hypothesis testing, we get Pr(D|H). Now, Pr(H|D) = Pr(D|H) * (Pr(H) / Pr(D)). We know Pr(D|H) and we can compute Pr(D), but what value do we have to use for Pr(H)? '''Claudius 14Feb10 14:41'''

Hi Eric, I don't care whether you use integers or doubles, but I am curious why the one should be any harder than the other? '''Hannah 12Feb10 19:02'''

May we use integers for sorting? Or do we have to use doubles? This is important for generating my sorted array '''Eric 12Feb10 18:56'''

If you're asking about the merging you can of course use a priority queue if you want, but you don't really need it when merging 2 lists. '''Marjan 18:28'''

Why would you use a priority queue? It's simple sorting, the exercise is not about implementing your own sorting algorithm or something like that. About exercise 3, it should be clear from the exercise itself that the sequences should be sorted (otherwise how can the merging work?) '''Marjan 18:23'''

Means that we have nothing to do than use a priority queue or something like that and don't have to implement the sorting? And at Exercise 3 the random set should be an ordered one or not? '''Alex 12Feb10 18:19'''

We prefer randomized sorting using bitonic networks, alternatively combined with LSD radix sort or simple pancake sort. That's of course a joke, it should be clear that you can use the built-in sorting functions (your own implementation will be certainly slower). '''Marjan 12Feb10 18:12'''

What does "do a standard sort" in exercise 2 mean? Shall I implement one on my own, or may I use the Java built-in sorting mechanisms? Also, which sorting algorithm do you prefer for this? '''Eric 12Feb10 18:04'''