6862
Comment:
|
← Revision 63 as of 2009-12-02 00:44:00 ⇥
12110
|
Deletions are marked like this. | Additions are marked like this. |
Line 23: | Line 23: |
||16 ||[[SearchEnginesWS0910/ManuelaOrtliebExercises|Manuela Ortlieb]] ||[[attachment:SearchEnginesWS0910/ManuelaOrtliebExercises/Manuela_Ortlieb_ex6.pdf|PDF]] ||[[attachment:SearchEnginesWS0910/ManuelaOrtliebExercises/Manuela_Ortlieb_ex6.zip|ZIP]] ||pdf ||pdf ||pdf || | ||16 ||[[SearchEnginesWS0910/ManuelaOrtliebExercises|Manuela Ortlieb]] ||[[attachment:SearchEnginesWS0910/ManuelaOrtliebExercises/Manuela_Ortlieb_ex6.pdf|PDF]] ||[[attachment:SearchEnginesWS0910/ManuelaOrtliebExercises/Manuela_Ortlieb_ex6.zip|ZIP]] ||-||-||-|| |
Line 30: | Line 30: |
== These were the questions and comments on Exercise Sheet 6 == Yes, you are right Marjan, I will just go through the ratios and invert those < 1. But one thing is for sure: a ratio of plain zero DOES NOT MAKE SENSE. So those with a plain zero, please either put the inverse, or use scientific notation. '''Hannah 1Dec09 10:17am''' I computed the ratios as it was shown in the example here on the wiki, that was dividing the values in the same order as they are mentioned (first through second). For me that meant values > 1 for exercise 1 and values < 1 for exercise 2, but I wanted to compute them in the same way. '''Florian 1Dec09 01:22am''' But if everybody computes the ratios as they want, the numbers in the table won't make sense! It also does not make sense if somebody computes the ration for Problem 1 in one way and then for Problem 2 to in another (I already don't know how the ratios are computed for the few uploaded solutions!) '''Marjan 30Nov09 00:25''' Hi Björn + all, it doesn't really matter, but I (and probably most humans) find ratios > 1 more intuitive. Just compare 8 and 0.125, which one is easier to grasp? '''Hannah 30Nov09 11:59pm''' Does it matter which way round we express the ratios? Depending on how we build the quotient, we get different values (all smaller or all greater 1). Or is that up to us? Should be possible to compare our results anyway, I assume. '''Björn 30Nov 23:36''' To Björn: You can assume you have gaps of arbitrary size. '''Marjan 30Nov 14:43''' To Claudius: The whole collection with all words. '''Marjan 30Nov 14:43''' Is there a limit on how large gaps may be in exercise 3? I'm not sure for which case the two entropies actually fulfill the equation. Gaps that "make sense" (ther sum is not larger than n-1), gaps that are at most n, or arbitrary gaps? '''Björn 30Nov09 14:31''' In Exercise 2, you ask for the costs of scanning the inv. lists of all words in the "collection". Do you mean the collection of words, matching the prefix or the the whole collection with all words in the inv. index? '''Claudius 30Nov09 2:16pm''' Hi Dragos, just three-letter prefixes are fine. I have no plans yet for future exercises with a "*" in the middle. '''Hannah 29Nov09 11:10pm''' For exercise 1, should we allow the "*" to be in any place ? Or just three letter prefix is sufficient ? I am asking because it would be good to know if we might need on later Exercise Sheets searches that allow multiple "*" in different positions, so that we do it now. '''Dragos 29 Nov 22:55''' Hi Björn, by ratio I simply mean the quotient, that is, how much bigger the one is then the other. For example, if, for a particular prefix, the total size from (1) is one million, and the size from (2) is ten thousand, then, for that prefix, the ratio between the two is one hundred. '''Hannah 29Nov09 7:48pm''' Hello, I wonder what's meant with the ratio demanded in exercise 1. If i have n lists with a maximum length of "a" and a total length of "b". Isn't the ratio something like "a:b"? At least that is what I thought. But adding a colon does not seem to be sufficient for a part of the exercise. Sorry for the meaningless question but I don't want to miss points because I'm not sure how to understand the word ratio. '''Björn 11-29 19:39''' To all: about the selection of the ten prefixes. The idea was that you pick a meaningful variety by hand, that is, such prefixes which one could imagine that one would really type them. The exact selection doesn't really matter, but do avoid extreme cases like a prefix ''yzq'' with one completion and an inverted list of three doc ids. '''Hannah 29Nov09 6:28pm''' To Marius + all: yes, I am sorry, "cost" was very imprecise here, I actually simply meant the time your code takes. '''Hannah 29Nov09 6:19pm''' So you say that you mean by "costs" the running time? Or do you understand something else when you say we have to calculate the costs? '''Marius 11/29/09 4:58pm''' Notice about Problem 2: You should use precise timers when measuring the running times. If your collection is very small and you round up your times, it's easy to get 0 ms when merging or scanning the inverted lists. I recommend using microsecond scale. '''Marjan 29Nov09 16:47''' To Florian: Yes, you can do anything you want to find those words (as long as you produce the required outputs). '''Marjan 29Nov09 16:44''' For exercise 1, should we use one of the methods presented in the lecture to find all words in the collection with the prefixes or can we do just anything to get them (though it might not be as efficient)? '''Florian 29Nov09 03:51pm''' When you scan, please make sure that you do something very simple with the elements, like summing up all doc ids, and then outputting that sum. Otherwise a clever compiler might figure out that it can remove the whole loop, because it is not producing a result that is used anywhere. '''Hannah 28Nov09 11:48pm''' To Mirko: Yes, scanning means one pass over the elements. '''Marjan 28Nov09 19:19''' Hi, about exercise2: is by "scanning" meant that one looks at every element exactly once? (=> costs of scanning a list are just the size of the list) '''Mirko 28Nov, 19:12''' |
Exercise Sheet 6
The rules for uploading are the same as always. If you forgot them, you can read them again here.
Your solutions (files can only be read by the uploader and by us)
Please put the ratios such that they are always > 1. That is, for ratios, which according to the exercise sheets are < 1, put the inverse. And another thing, please: whenever you put numbers of any kind, think about a meaningful number of digits to put after the dot. Here, for the first two columns one digit is certainly enough and for the last column an integer is enough.
No. |
Name |
Solution (PDF) |
Code (ZIP or TGZ) |
Avg. ratio from Problem 1 |
Avg. ratio 1 from Problem 2 |
Avg. ratio 2 from Problem 2 |
0 |
Marjan |
|
|
4.0 |
84.7 |
7751 |
1 |
8.0 |
2.2 |
7692 |
|||
2 |
4.6 |
16.0 |
3130 |
|||
3 |
2.1 |
13.0 |
40 |
|||
4 |
12.8 |
4.7 |
1473 |
|||
5 |
1.8 |
524.9 |
4188 |
|||
6 |
3 |
147.1 |
138 |
|||
7 |
4.0 |
25.0 |
TODO |
|||
8 |
4.6 |
1006 |
50859 |
|||
9 |
10 |
50 |
8000 |
|||
10 |
4.7 |
11.6 |
14.2 |
|||
11 |
2.5 |
3.3 |
1000 |
|||
12 |
6.3 |
1.6 |
5000 |
|||
13 |
7.9 |
314 |
15350 |
|||
14 |
- |
- |
- |
|||
15 |
- |
- |
- |
|||
16 |
- |
- |
- |
|||
17 |
1.9 |
4.6 |
341 |
|||
18 |
2.39 |
13.31 |
998 |
|||
19 |
8.34 |
2.53 |
1712 |
|||
20 |
6.8 |
47.8 |
4752.9 |
|||
21 |
11.18 |
2304 |
54160 |
|||
22 |
in pdf |
in pdf |
in pdf |
These were the questions and comments on Exercise Sheet 6
Yes, you are right Marjan, I will just go through the ratios and invert those < 1. But one thing is for sure: a ratio of plain zero DOES NOT MAKE SENSE. So those with a plain zero, please either put the inverse, or use scientific notation. Hannah 1Dec09 10:17am
I computed the ratios as it was shown in the example here on the wiki, that was dividing the values in the same order as they are mentioned (first through second). For me that meant values > 1 for exercise 1 and values < 1 for exercise 2, but I wanted to compute them in the same way. Florian 1Dec09 01:22am
But if everybody computes the ratios as they want, the numbers in the table won't make sense! It also does not make sense if somebody computes the ration for Problem 1 in one way and then for Problem 2 to in another (I already don't know how the ratios are computed for the few uploaded solutions!) Marjan 30Nov09 00:25
Hi Björn + all, it doesn't really matter, but I (and probably most humans) find ratios > 1 more intuitive. Just compare 8 and 0.125, which one is easier to grasp? Hannah 30Nov09 11:59pm
Does it matter which way round we express the ratios? Depending on how we build the quotient, we get different values (all smaller or all greater 1). Or is that up to us? Should be possible to compare our results anyway, I assume. Björn 30Nov 23:36
To Björn: You can assume you have gaps of arbitrary size. Marjan 30Nov 14:43
To Claudius: The whole collection with all words. Marjan 30Nov 14:43
Is there a limit on how large gaps may be in exercise 3? I'm not sure for which case the two entropies actually fulfill the equation. Gaps that "make sense" (ther sum is not larger than n-1), gaps that are at most n, or arbitrary gaps? Björn 30Nov09 14:31
In Exercise 2, you ask for the costs of scanning the inv. lists of all words in the "collection". Do you mean the collection of words, matching the prefix or the the whole collection with all words in the inv. index? Claudius 30Nov09 2:16pm
Hi Dragos, just three-letter prefixes are fine. I have no plans yet for future exercises with a "*" in the middle. Hannah 29Nov09 11:10pm
For exercise 1, should we allow the "*" to be in any place ? Or just three letter prefix is sufficient ? I am asking because it would be good to know if we might need on later Exercise Sheets searches that allow multiple "*" in different positions, so that we do it now. Dragos 29 Nov 22:55
Hi Björn, by ratio I simply mean the quotient, that is, how much bigger the one is then the other. For example, if, for a particular prefix, the total size from (1) is one million, and the size from (2) is ten thousand, then, for that prefix, the ratio between the two is one hundred. Hannah 29Nov09 7:48pm
Hello, I wonder what's meant with the ratio demanded in exercise 1. If i have n lists with a maximum length of "a" and a total length of "b". Isn't the ratio something like "a:b"? At least that is what I thought. But adding a colon does not seem to be sufficient for a part of the exercise. Sorry for the meaningless question but I don't want to miss points because I'm not sure how to understand the word ratio. Björn 11-29 19:39
To all: about the selection of the ten prefixes. The idea was that you pick a meaningful variety by hand, that is, such prefixes which one could imagine that one would really type them. The exact selection doesn't really matter, but do avoid extreme cases like a prefix yzq with one completion and an inverted list of three doc ids. Hannah 29Nov09 6:28pm
To Marius + all: yes, I am sorry, "cost" was very imprecise here, I actually simply meant the time your code takes. Hannah 29Nov09 6:19pm
So you say that you mean by "costs" the running time? Or do you understand something else when you say we have to calculate the costs? Marius 11/29/09 4:58pm
Notice about Problem 2: You should use precise timers when measuring the running times. If your collection is very small and you round up your times, it's easy to get 0 ms when merging or scanning the inverted lists. I recommend using microsecond scale. Marjan 29Nov09 16:47
To Florian: Yes, you can do anything you want to find those words (as long as you produce the required outputs). Marjan 29Nov09 16:44
For exercise 1, should we use one of the methods presented in the lecture to find all words in the collection with the prefixes or can we do just anything to get them (though it might not be as efficient)? Florian 29Nov09 03:51pm
When you scan, please make sure that you do something very simple with the elements, like summing up all doc ids, and then outputting that sum. Otherwise a clever compiler might figure out that it can remove the whole loop, because it is not producing a result that is used anywhere. Hannah 28Nov09 11:48pm
To Mirko: Yes, scanning means one pass over the elements. Marjan 28Nov09 19:19
Hi, about exercise2: is by "scanning" meant that one looks at every element exactly once? (=> costs of scanning a list are just the size of the list) Mirko 28Nov, 19:12