Diff for "InformationRetrievalWS1516/ExamSolutions"

Differences between revisions 1 and 6 (spanning 5 versions)

TASK 1 (Ranking and evaluation)

bla   1  1  2  2  3
bli   1  0  1  2  2
blu   0  1  1  0  1

1.1 AP for query "bla bli"

The vector for "bla blu" is simply (1, 0, 1)
Dot product similarities for D1, ..., D5: 1, 2, 3, 2, 4
Documents ordered by these similarities: D5, D3, D2 and D4, D1  (D2 and D4 have the same score)
Bit array of relevances: 1, 0, 0, 0, 1
Precisions: P@1 = 100%, P@5 = 40%  (the others are not needed to compute the AP)
Average precision (AP): 70%

1.2 Function compute_ap

def compute_ap(relevances):
    prec_sum = 0
    num_rel = 0
    for i in range(len(relevances)):
        if relevances[i] == 1:
            num_rel += 1
            prec = num_rel / (i + 1)
            prec_sum += prec
    return prec_sum / num_rel

1.3 Prove that AP = 100% if and only if all 1s at the front

If all 1s at the front, then each prec in the code above is 100%, and so will be the average.

If AP = 100%, then the last prec must be 100%, which can only happen if there is no 0 before the last 1.

1.4 Write A as product of 3 x 2 and 2 x 5 ... what does this say about rank(A)

1  1  2  2  3       1  1
1  0  1  2  2   =   1  0  *  1  0  1  2  2 
0  1  1  0  1       0  1     0  1  1  0  1

This shows that each column of A can be written as a linear combination of the two columns of the 3 x 2 matrix on the right hand side. This proves that rank(A) <= 2.

TASK 2 (Encodings)

c(1) = 1, c(2) = 01, c(3) = 001, c(4) = 0001, ...

2.1 Decode 110000100111001 + why prefix-free

Integer sequence encoded: 1, 1, 5, 3, 1, 1, 3

All valid codes end in 1. Any non-empty prefix of any of the codes ends with a zero. Therefore, no prefix of any of the code is a valid code.

2.2 Function decode

def decode(bits):
    x = 0
    result = []
    for bit in bits:
        x += 1
        if bit == 1:
            result.add(x) 
            x = 0
    return result

2.3 Random code, 0 or 1 with probability 1/2 each until 1

The code for i has exactly x bits. The probability of generating exactly those x bits with the process described is 1/2 * ... * 1/2 (x times), that is, 2 to the -x.

2.4 Entropy-optimality for this probability distribution

Let L_i be the length of the code for integer i. Then L_i = i.

The probability p_i of generating that code is 2^-i (according to Task 2.3).

For entropy-optimality, we need L_i <= log_2 1/p_i + 1 = i + 1. This is true.

-  ⇤ ← Revision 1 as of 2016-02-23 14:26:59 → 
  Size: 1511
  Editor: Hannah Bast
  Comment:
+   ← Revision 6 as of 2016-02-23 14:40:59 → ⇥
  Size: 2682
  Editor: Hannah Bast
  Comment:
-Deletions are marked like this.
+Additions are marked like this.
 Line 1:
-#acl Björn Buchhold:read,write Elmar Haussmann:read,write Claudius Korzen:read,write Florian Bäurle:read,write Jonas Sternisko:read,write All:read
+#acl Björn Buchhold:read,write Elmar Haussmann:read,write Claudius Korzen:read,write Sabine Storandt:read,write
 Line 16:
-Documents ordered by these similarities: D5, D3, D2 and D4, D1   (D2 and D4 have the same score)
+Documents ordered by these similarities: D5, D3, D2 and D4, D1  (D2 and D4 have the same score)
 Line 18:
-Precisions: P@1 = 100%, P@5 = 40%
+Precisions: P@1 = 100%, P@5 = 40%  (the others are not needed to compute the AP)
 Line 42:
-=== 1.4
+=== 1.4 Write A as product of 3 x 2 and 2 x 5 ... what does this say about rank(A) ===
 Line 51:
+== TASK 2 (Encodings) ==

c(1) = 1, c(2) = 01, c(3) = 001, c(4) = 0001, ...

=== 2.1 Decode 110000100111001 + why prefix-free ===

Integer sequence encoded: 1, 1, 5, 3, 1, 1, 3

All valid codes end in 1. Any non-empty prefix of any of the codes ends with a zero. Therefore, no prefix of any of the code is a valid code.

=== 2.2 Function decode ===

{{{
def decode(bits):
    x = 0
    result = []
    for bit in bits:
        x += 1
        if bit == 1:
            result.add(x) 
            x = 0
    return result
}}}

=== 2.3 Random code, 0 or 1 with probability 1/2 each until 1 ===

The code for i has exactly x bits. The probability of generating exactly those x bits with the process described is 1/2 * ... * 1/2 (x times), that is, 2 to the -x.

=== 2.4 Entropy-optimality for this probability distribution ===

Let L_i be the length of the code for integer i. Then L_i = i.

The probability p_i of generating that code is 2^-i (according to Task 2.3).

For entropy-optimality, we need L_i <= log_2 1/p_i + 1 = i + 1. This is true.