AD Teaching Wiki:

Exercise Sheet 6

The rules for uploading are the same as always. If you forgot them, you can read them again here.

Your solutions (files can only be read by the uploader and by us)

Please put the ratios such that they are always > 1. That is, for ratios, which according to the exercise sheets are < 1, put the inverse. And another thing, please: whenever you put numbers of any kind, think about a meaningful number of digits to put after the dot. Here, for the first two columns one digit is certainly enough and for the last column an integer is enough.

No.

Name

Solution (PDF)

Code (ZIP or TGZ)

Avg. ratio from Problem 1

Avg. ratio 1 from Problem 2

Avg. ratio 2 from Problem 2

0

Marjan

4.0

84.7

7751

1

Florian Bäurle

PDF

ZIP

8.0

2.2

7692

2

Mirko Brodesser

PDF

ZIP

4.6

16.0

3130

3

Jonas Krisch

PDF

ZIP

2.1

13.0

40

4

Zhongjie Cai

PDF

ZIP

12.8

4.7

1473

5

Eric Lacher

PDF(updated)

ZIP

1.8

524.9

4188

6

Achille Nana

PDF

ZIP

3

147.1

138

7

Matthias Sauer

PDF

ZIP

4.0

25.0

TODO

8

Matthias Frorath

PDF

RAR

4.6

1006

50859

9

Björn Buchhold

PDF

ZIP

10

50

8000

10

Paresh Paradkar

PDF

ZIP

4.7

11.6

14.2

11

Markus Gruetzner

PDF

ZIP

2.5

3.3

1000

12

Marius Greitschus

PDF

TGZ

6.3

1.6

5000

13

Alexander Nutz

PDF

ZIP

7.9

314

15350

14

Johannes Stork

PDF

tar.gz

-

-

-

15

Jens Silva Santisteban

PDF

ZIP

-

-

-

16

Manuela Ortlieb

PDF

ZIP

-

-

-

17

Claudius Korzen

PDF

ZIP

1.9

4.6

341

18

Waleed Butt

PDF

ZIP

2.39

13.31

998

19

Dragos Sorescu

PDF

ZIP

8.34

2.53

1712

20

Alexander Gutjahr

PDF

ZIP

6.8

47.8

4752.9

21

Alexander Schneider

PDF

ZIP

11.18

2304

54160

22

Daniel Schauenberg

PDF

tar.gz

in pdf

in pdf

in pdf

These were the questions and comments on Exercise Sheet 6

Yes, you are right Marjan, I will just go through the ratios and invert those < 1. But one thing is for sure: a ratio of plain zero DOES NOT MAKE SENSE. So those with a plain zero, please either put the inverse, or use scientific notation. Hannah 1Dec09 10:17am

I computed the ratios as it was shown in the example here on the wiki, that was dividing the values in the same order as they are mentioned (first through second). For me that meant values > 1 for exercise 1 and values < 1 for exercise 2, but I wanted to compute them in the same way. Florian 1Dec09 01:22am

But if everybody computes the ratios as they want, the numbers in the table won't make sense! It also does not make sense if somebody computes the ration for Problem 1 in one way and then for Problem 2 to in another (I already don't know how the ratios are computed for the few uploaded solutions!) Marjan 30Nov09 00:25

Hi Björn + all, it doesn't really matter, but I (and probably most humans) find ratios > 1 more intuitive. Just compare 8 and 0.125, which one is easier to grasp? Hannah 30Nov09 11:59pm

Does it matter which way round we express the ratios? Depending on how we build the quotient, we get different values (all smaller or all greater 1). Or is that up to us? Should be possible to compare our results anyway, I assume. Björn 30Nov 23:36

To Björn: You can assume you have gaps of arbitrary size. Marjan 30Nov 14:43

To Claudius: The whole collection with all words. Marjan 30Nov 14:43

Is there a limit on how large gaps may be in exercise 3? I'm not sure for which case the two entropies actually fulfill the equation. Gaps that "make sense" (ther sum is not larger than n-1), gaps that are at most n, or arbitrary gaps? Björn 30Nov09 14:31

In Exercise 2, you ask for the costs of scanning the inv. lists of all words in the "collection". Do you mean the collection of words, matching the prefix or the the whole collection with all words in the inv. index? Claudius 30Nov09 2:16pm

Hi Dragos, just three-letter prefixes are fine. I have no plans yet for future exercises with a "*" in the middle. Hannah 29Nov09 11:10pm

For exercise 1, should we allow the "*" to be in any place ? Or just three letter prefix is sufficient ? I am asking because it would be good to know if we might need on later Exercise Sheets searches that allow multiple "*" in different positions, so that we do it now. Dragos 29 Nov 22:55

Hi Björn, by ratio I simply mean the quotient, that is, how much bigger the one is then the other. For example, if, for a particular prefix, the total size from (1) is one million, and the size from (2) is ten thousand, then, for that prefix, the ratio between the two is one hundred. Hannah 29Nov09 7:48pm

Hello, I wonder what's meant with the ratio demanded in exercise 1. If i have n lists with a maximum length of "a" and a total length of "b". Isn't the ratio something like "a:b"? At least that is what I thought. But adding a colon does not seem to be sufficient for a part of the exercise. Sorry for the meaningless question but I don't want to miss points because I'm not sure how to understand the word ratio. Björn 11-29 19:39

To all: about the selection of the ten prefixes. The idea was that you pick a meaningful variety by hand, that is, such prefixes which one could imagine that one would really type them. The exact selection doesn't really matter, but do avoid extreme cases like a prefix yzq with one completion and an inverted list of three doc ids. Hannah 29Nov09 6:28pm

To Marius + all: yes, I am sorry, "cost" was very imprecise here, I actually simply meant the time your code takes. Hannah 29Nov09 6:19pm

So you say that you mean by "costs" the running time? Or do you understand something else when you say we have to calculate the costs? Marius 11/29/09 4:58pm

Notice about Problem 2: You should use precise timers when measuring the running times. If your collection is very small and you round up your times, it's easy to get 0 ms when merging or scanning the inverted lists. I recommend using microsecond scale. Marjan 29Nov09 16:47

To Florian: Yes, you can do anything you want to find those words (as long as you produce the required outputs). Marjan 29Nov09 16:44

For exercise 1, should we use one of the methods presented in the lecture to find all words in the collection with the prefixes or can we do just anything to get them (though it might not be as efficient)? Florian 29Nov09 03:51pm

When you scan, please make sure that you do something very simple with the elements, like summing up all doc ids, and then outputting that sum. Otherwise a clever compiler might figure out that it can remove the whole loop, because it is not producing a result that is used anywhere. Hannah 28Nov09 11:48pm

To Mirko: Yes, scanning means one pass over the elements. Marjan 28Nov09 19:19

Hi, about exercise2: is by "scanning" meant that one looks at every element exactly once? (=> costs of scanning a list are just the size of the list) Mirko 28Nov, 19:12

AD Teaching Wiki: SearchEnginesWS0910/ExerciseSheet6 (last edited 2009-12-02 00:44:00 by Hannah Bast)