AD Teaching Wiki:

TASK 1 (Ranking and evaluation)

bla   1  1  2  2  3
bli   1  0  1  2  2
blu   0  1  1  0  1

1.1 AP for query "bla bli"

The vector for "bla blu" is simply (1, 0, 1)
Dot product similarities for D1, ..., D5: 1, 2, 3, 2, 4
Documents ordered by these similarities: D5, D3, D2 and D4, D1  (D2 and D4 have the same score)
Bit array of relevances: 1, 0, 0, 0, 1
Precisions: P@1 = 100%, P@5 = 40%  (the others are not needed to compute the AP)
Average precision (AP): 70%

1.2 Function compute_ap

def compute_ap(relevances):
    prec_sum = 0
    num_rel = 0
    for i in range(len(relevances)):
        if relevances[i] == 1:
            num_rel += 1
            prec = num_rel / (i + 1)
            prec_sum += prec
    return prec_sum / num_rel

1.3 Prove that AP = 100% if and only if all 1s at the front

If all 1s at the front, then each prec in the code above is 100%, and so will be the average.

If AP = 100%, then the last prec must be 100%, which can only happen if there is no 0 before the last 1.

1.4 Write A as product of 3 x 2 and 2 x 5 ... what does this say about rank(A)

1  1  2  2  3       1  1
1  0  1  2  2   =   1  0  *  1  0  1  2  2 
0  1  1  0  1       0  1     0  1  1  0  1

This shows that each column of A can be written as a linear combination of the two columns of the 3 x 2 matrix on the right hand side. This proves that rank(A) <= 2.

TASK 2 (Encodings)

c(1) = 1, c(2) = 01, c(3) = 001, c(4) = 0001, ...

2.1 Decode 110000100111001 + why prefix-free

Integer sequence encoded: 1, 1, 5, 3, 1, 1, 3

All valid codes end in 1. Any non-empty prefix of any of the codes ends with a zero. Therefore, no prefix of any of the code is a valid code.

2.2 Function decode

def decode(bits):
    x = 0
    result = []
    for bit in bits:
        x += 1
        if bit == 1:
            result.append(x) 
            x = 0
    return result if x == 0 else []

2.3 Random code, 0 or 1 with probability 1/2 each until 1

The code for x has exactly x bits. The probability of generating exactly those x bits with the process described is 1/2 * ... * 1/2 (x times), that is, 2 to the -x.

2.4 Entropy-optimality for this probability distribution

Let L_i be the length of the code for integer i. Then L_i = i.

The probability p_i of generating that code is 2^-i (according to Task 2.3).

For entropy-optimality, we need L_i <= log_2 (1/p_i) + 1 = i + 1. This is true.

TASK 3 ()

$(document).ready(function() {
  $("#query").keyup(function() {
    var query = $("#query").val();
    $.get("http://www.gugel.de:8888/?q="+ query, function(result) {
      $("#result").html(result);
    })
  })
})

3.1 Simple HTML that makes sense for this JavaScript (without head part)

<html>
  <body>
    <input id="query"/>
    <div id="result"></div>
  </body>
</html>

3.2 First fifteen characters of GET requests when typing "hey" + to which machine

GET /?q=h HTTP/
GET /?q=he HTTP
GET /?q=hey HTT

The requests are sent to the machine with the address www.gugel.de

3.3 Number of valid UTF-8 sequences of length 2

Valid UTF-8 sequences of length 2 look like this

110xxxxx 10yyyyyy

where the 11-bit codepoint xxxxxyyyyyy encodes an integer that must be > 127 (otherwise a one-byte UTF-8 must be used).

The number of such codepoints is 2048 - 128 = 1920.

3.4 UTF-8 code and URL-escaped code of ΓΌ

ISO-8859-1 code in binary: 11111100
With padding, so that 11 bits: 00011111100
As UTF-8 code: 11000011 10111100 (Byte 1: 110 + first five bits of codepoint, Byte 2: 10 + last five bits)
In decimal: 195 188
In hexadecimal: C3 BC
URL-escaped: %C3%BC

TASK 4 (Perceptrons)

Training set: {1, 2, 3, 4} (negative), {5, 6, 7, 8} (positive)

4.1 First round of Perceptron algorithm

x = 1: w * x - b =  0  -->  update: w = 0 - 1 = -1, b = 0 + 1 = 1
x = 2: w * x - b = -3  -->  no update
x = 3: w * x - b = -4  -->  no update
x = 4: w * x - b = -5  -->  no update
x = 5: w * x - b = -6  -->  update: w = -1 + 5 = 4, b = 1 - 1 = 0
x = 6: w * x - b = 24  -->  no update
x = 7: w * x - b = 28  -->  no update
x = 8: w * x - b = 32  -->  no update

4.2 Function perceptron

def perceptron():
    (w, b) = (0, 0)
    for round in range(50):
        for x in [1, 2, 3, 4, 5, 6, 7, 8]:
            if x <= 4 and w * x - b >= 0:
                (w, b) = (w - x, b + 1)
            if x >= 5 and w * x - b <= 0:
                (w, b) = (w + x, b - 1)
    return (w, b)

4.3 Prove ww* = - w* and bb* = -b when training labels swapped

Let wi, bi and wwi, bbi as in the hint. Then we can prove by induction that wwi = -wi and bbi = -bi:

In the beginning, all values are zero, so correct.

Assume that wwi = -wi and bbi = -bi after iteration i and look at what happens in iteration i+1. There are four cases:

1. x negative in original, no update --> x positive in variant, no update

2. x positive in original, no update --> x negative in variant, no update

3. x negative in original, update (wi+1, bi+1) = (wi - x, bi + 1) --> x positive in variant, update (wwi+1, bbi+1) = (wwi + x, bbi - 1)

4. x positive in original, update (wi+1, bi+1) = (wi + x, bi - 1) --> x negative in variant, update (wwi+1, bbi+1) = (wwi - x, bbi + 1)

TASK 5 (Linear algebra)

D1: a D2: aabcc D3: bb D4: c

5.1 idf scores for a, b, c and term-document matrix A

Each word occurs in exactly 2 of the four documents, so the idf of each is log2 (4/2) = 1. The tf.idf term-document matrix A is hence:

1  2  0  0
0  1  2  0
0  2  0  1

5.2 Similarities with C,,1,,: a b c and C,,2,,: b b

The vector for C1 is (1, 1, 1) and the vector for C2 is (0, 2, 0). We can compute the similarity matrix as:

            1  2  0  0     
1  1  1  *  0  1  2  0  =  1  5  2  1   
0  2  0     0  2  0  1     0  2  4  0

Hence D1, D2, D4 are more similar to C1 and D3 is more similar to C2.

5.3 Average for documents closer to C,,1,, and C,,2,,, respectively

1  2  0  0     1/3  0 
0  1  2  0  *  1/3  0 
0  2  0  1      0   1
               1/3  0

5.4 Function matrix_mult

Note: if the two matrices A, B are compatible for multiplication, then num_cols(A) = num_rows(B).

def matrix_mult(A, B):
    C = zeroes(num_rows(A), num_cols(B))
    for i in range(num_rows(A)):
        for j in range(num_cols(B)):
            for k in range(num_cols(A)):
                C[i][j] += A[i][k]*B[k][j]
    return C

TASK 6 (Miscellaneous)

The padding ensures that each position is relevant for the same number of q-grams, which permits a stronger lower bound. Without padding, the positions at the beginning or end of the string would be relevant for fewer q-grams and we would have to work with weaker lower bounds.

For example, "abc" and "xbc" have edit distance 1 but would have no 3-grams in common without padding. That is, the lower bound on the number of common 3-grams for edit distance 1 would be 0 and we would have to check all strings. With padding, they have "bc$" and "c$$" in common, and, indeed, the lower bound according to our formula is 2.

More padding would be meaningless, because it would add $-only q-grams. (This observation was not necessary to get all points.)

6.2 Sum of number from two independent dice rolls

Let Xi be the number of the i-th roll. Then Pr(S = even) = Pr(X1 = odd and X2 = odd) + Pr(X1 = even and X2 = even) = 1/2 * 1/2 + 1/2 * 1/2 = 1/2.

Pr(S = 12 | E) = Pr (S = 12 and E) / Pr(E) = Pr(S = 12) / Pr(E) = (1/36) / (1/2) = 1/18

6.3 When is the entropy for two states maximized

Let p = Pr(X = x1) and hence 1 - p = Pr(X = x2). Then H(X) = - p * log2 p - (1 - p) * log2 (1 - p). We want to maximize H(X), so we can also look at - p * ln p - (1 - p) * ln (1 - p) ... the difference is only a constant factor (see hint) and the derivation is slightly easier to compute.

The derivative is 1 + ln p - (1 - ln (1-p)) = ln p - ln (1 - p) ... see hint, or use the product formula. This is zero if and only if p = 1/2. By checking any other value for p, we see that we have a maximum at p = 1/2.

6.4 SPARQL query for "actors and actresses from films with a european director

SELECT ?p WHERE {
  ?p film ?f .
  ?f director ?d .
  ?d nationality ?c .
  Europe contains ?c
}

AD Teaching Wiki: InformationRetrievalWS1516/ExamSolutions (last edited 2016-02-23 16:37:02 by Hannah Bast)