AD Teaching Wiki:

TASK 1 (Ranking and evaluation)

bla   1  1  2  2  3
bli   1  0  1  2  2
blu   0  1  1  0  1

1.1 AP for query "bla bli"

The vector for "bla blu" is simply (1, 0, 1)
Dot product similarities for D1, ..., D5: 1, 2, 3, 2, 4
Documents ordered by these similarities: D5, D3, D2 and D4, D1  (D2 and D4 have the same score)
Bit array of relevances: 1, 0, 0, 0, 1
Precisions: P@1 = 100%, P@5 = 40%  (the others are not needed to compute the AP)
Average precision (AP): 70%

1.2 Function compute_ap

def compute_ap(relevances):
    prec_sum = 0
    num_rel = 0
    for i in range(len(relevances)):
        if relevances[i] == 1:
            num_rel += 1
            prec = num_rel / (i + 1)
            prec_sum += prec
    return prec_sum / num_rel

1.3 Prove that AP = 100% if and only if all 1s at the front

If all 1s at the front, then each prec in the code above is 100%, and so will be the average.

If AP = 100%, then the last prec must be 100%, which can only happen if there is no 0 before the last 1.

1.4 Write A as product of 3 x 2 and 2 x 5 ... what does this say about rank(A)

1  1  2  2  3       1  1
1  0  1  2  2   =   1  0  *  1  0  1  2  2 
0  1  1  0  1       0  1     0  1  1  0  1

This shows that each column of A can be written as a linear combination of the two columns of the 3 x 2 matrix on the right hand side. This proves that rank(A) <= 2.

TASK 2 (Encodings)

c(1) = 1, c(2) = 01, c(3) = 001, c(4) = 0001, ...

2.1 Decode 110000100111001 + why prefix-free

Integer sequence encoded: 1, 1, 5, 3, 1, 1, 3

All valid codes end in 1. Any non-empty prefix of any of the codes ends with a zero. Therefore, no prefix of any of the code is a valid code.

2.2 Function decode

def decode(bits):
    x = 0
    result = []
    for bit in bits:
        x += 1
        if bit == 1:
            result.append(x) 
            x = 0
    return result if x == 0 else []

2.3 Random code, 0 or 1 with probability 1/2 each until 1

The code for x has exactly x bits. The probability of generating exactly those x bits with the process described is 1/2 * ... * 1/2 (x times), that is, 2 to the -x.

2.4 Entropy-optimality for this probability distribution

Let L_i be the length of the code for integer i. Then L_i = i.

The probability p_i of generating that code is 2^-i (according to Task 2.3).

For entropy-optimality, we need L_i <= log_2 (1/p_i) + 1. The left hand side is i, the right hand side is i + 1, so this is true.

TASK 3 (Web applications and UTF-8)

$(document).ready(function() {
  $("#query").keyup(function() {
    var query = $("#query").val();
    $.get("http://www.gugel.de:8888/?q="+ query, function(result) {
      $("#result").html(result);
    })
  })
})

3.1 Simple HTML that makes sense for this JavaScript (without head part)

<html>
  <body>
    <input id="query"/>
    <div id="result"></div>
  </body>
</html>

3.2 First fifteen characters of GET requests when typing "hey" + to which machine

GET /?q=h HTTP/
GET /?q=he HTTP
GET /?q=hey HTT

The requests are sent to the machine with the address www.gugel.de

3.3 Number of valid UTF-8 sequences of length 2

Valid UTF-8 sequences of length 2 look like this

110xxxxx 10yyyyyy

where the 11-bit codepoint xxxxxyyyyyy encodes an integer that must be > 127 (otherwise a one-byte UTF-8 must be used).

The number of such codepoints is 2048 - 128 = 1920.

3.4 UTF-8 code and URL-escaped code of ΓΌ

ISO-8859-1 code in binary: 11111100
With padding, so that 11 bits: 00011111100
As UTF-8 code: 11000011 10111100 (Byte 1: 110 + first five bits of codepoint, Byte 2: 10 + last five bits)
In decimal: 195 188
In hexadecimal: C3 BC
URL-escaped: %C3%BC (UTF-8) or %FC (ISO-8859-1) ... both count as correct, since the task formulation was ambiguous in this respect.

TASK 4 (Perceptrons)

Training set: {1, 2, 3, 4} (negative), {5, 6, 7, 8} (positive)

4.1 First round of Perceptron algorithm

x = 1: w * x - b =  0  -->  update: w = 0 - 1 = -1, b = 0 + 1 = 1
x = 2: w * x - b = -3  -->  no update
x = 3: w * x - b = -4  -->  no update
x = 4: w * x - b = -5  -->  no update
x = 5: w * x - b = -6  -->  update: w = -1 + 5 = 4, b = 1 - 1 = 0
x = 6: w * x - b = 24  -->  no update
x = 7: w * x - b = 28  -->  no update
x = 8: w * x - b = 32  -->  no update

4.2 Function perceptron

def perceptron():
    (w, b) = (0, 0)
    for round in range(50):
        for x in [1, 2, 3, 4, 5, 6, 7, 8]:
            if x <= 4 and w * x - b >= 0:
                (w, b) = (w - x, b + 1)
            if x >= 5 and w * x - b <= 0:
                (w, b) = (w + x, b - 1)
    return (w, b)

4.3 Prove ww* = - w* and bb* = -b when training labels swapped

Without loss of generality assume that b is part of the w vector (like we did in the lecture). The argument below can also easily be extended to an explicit b, it's just more wordy than.

Let wi and wwi be defined as in the hint. Then we can prove by induction that wwi = -wi:

In the beginning, w0 = ww0 = 0, so correct.

Assume that wwi = -wi after iteration i (for i = 0: in the beginning) and look at what happens in iteration i+1. Then wwi * x = - wi * x. In particular, if one is > 0 the other is < 0 and vice versa, and if one is zero the other is zero too. There are four cases:

1. x negative in original, no update --> wi * x > 0 --> wwi * x < 0 --> no update in variant (where x is positive).

2. x positive in original, no update: analogous.

3. x negative in original, update --> wi * x >= 0 and wi+1 = wi - x --> in variant wwi * x <= 0 and update wwi+1 = wwi + x = -wi + x = - (wi - x) = - wi+1.

4. x positive in original, update wi+1 = wi + x: analogous.

TASK 5 (Linear algebra)

D1: a D2: aabcc D3: bb D4: c

5.1 idf scores for a, b, c and term-document matrix A

Each word occurs in exactly two of the four documents, so the idf of each is log2 (4/2) = 1. The tf.idf term-document matrix A is hence:

1  2  0  0
0  1  2  0
0  2  0  1

5.2 Similarities with C1: a b c and C2: b b

The vector for C1 is (1, 1, 1) and the vector for C2 is (0, 2, 0). We can compute the similarity matrix as:

            1  2  0  0     
1  1  1  *  0  1  2  0  =  1  5  2  1   
0  2  0     0  2  0  1     0  2  4  0

Hence D1, D2, D4 are more similar to C1 and D3 is more similar to C2.

5.3 Average for documents closer to C1 and C2, respectively

1  2  0  0     1/3  0 
0  1  2  0  *  1/3  0 
0  2  0  1      0   1
               1/3  0

5.4 Function matrix_mult

Note: if the two matrices A, B are compatible for multiplication, then num_cols(A) = num_rows(B).

def matrix_mult(A, B):
    C = zeroes(num_rows(A), num_cols(B))
    for i in range(num_rows(A)):
        for j in range(num_cols(B)):
            for k in range(num_cols(A)):
                C[i][j] += A[i][k]*B[k][j]
    return C

TASK 6 (Miscellaneous)

The padding ensures that each position is relevant for the same number of q-grams, which permits a stronger lower bound. Without padding, the positions at the beginning or end of the string would be relevant for fewer q-grams and we would have to work with weaker lower bounds.

For example, "abc" and "xbc" have edit distance 1 but would have no 3-grams in common without padding. That is, the lower bound on the number of common 3-grams for edit distance 1 would be 0 and we would have to check all strings. With padding, they have "bc$" and "c$$" in common, and, indeed, the lower bound according to our formula is 2.

More padding would be meaningless, because it would add $-only q-grams. (This observation was not necessary to get all points.)

6.2 Sum of number from two independent dice rolls

Let Xi be the number of the i-th roll. Then Pr(S = even) = Pr(X1 = odd and X2 = odd) + Pr(X1 = even and X2 = even) = 1/2 * 1/2 + 1/2 * 1/2 = 1/2.

Pr(S = 12 | E) = Pr (S = 12 and E) / Pr(E) = Pr(S = 12) / Pr(E) = (1/36) / (1/2) = 1/18

6.3 When is the entropy for two states maximized

Let p = Pr(X = x1) and hence 1 - p = Pr(X = x2). Then H(X) = - p * log2 p - (1 - p) * log2 (1 - p). We want to maximize H(X), so we can also look at - p * ln p - (1 - p) * ln (1 - p) ... the difference is only a constant factor (see hint) and the derivation is slightly easier to compute.

The derivative is - (1 + ln p) + (1 + ln (1-p)) = - ln p + ln (1 - p) ... see hint, or use the product formula. This is zero if and only if p = 1/2. By checking any other value for p, we see that we have a maximum at p = 1/2.

6.4 SPARQL query for "actors and actresses from films with a european director

SELECT ?p WHERE {
  ?p film ?f .
  ?d director ?f .
  ?d nationality ?c .
  Europe contains ?c
}

AD Teaching Wiki: InformationRetrievalWS1516/ExamSolutions (last edited 2016-02-24 02:11:40 by Hannah Bast)