#acl Björn Buchhold:read,write Elmar Haussmann:read,write Claudius Korzen:read,write Sabine Storandt:read,write All:read == TASK 1 (Ranking and evaluation) == {{{ bla 1 1 2 2 3 bli 1 0 1 2 2 blu 0 1 1 0 1 }}} === 1.1 AP for query "bla bli" === {{{ The vector for "bla blu" is simply (1, 0, 1) Dot product similarities for D1, ..., D5: 1, 2, 3, 2, 4 Documents ordered by these similarities: D5, D3, D2 and D4, D1 (D2 and D4 have the same score) Bit array of relevances: 1, 0, 0, 0, 1 Precisions: P@1 = 100%, P@5 = 40% (the others are not needed to compute the AP) Average precision (AP): 70% }}} === 1.2 Function compute_ap === {{{ def compute_ap(relevances): prec_sum = 0 num_rel = 0 for i in range(len(relevances)): if relevances[i] == 1: num_rel += 1 prec = num_rel / (i + 1) prec_sum += prec return prec_sum / num_rel }}} === 1.3 Prove that AP = 100% if and only if all 1s at the front === If all 1s at the front, then each prec in the code above is 100%, and so will be the average. If AP = 100%, then the last prec must be 100%, which can only happen if there is no 0 before the last 1. === 1.4 Write A as product of 3 x 2 and 2 x 5 ... what does this say about rank(A) === {{{ 1 1 2 2 3 1 1 1 0 1 2 2 = 1 0 * 1 0 1 2 2 0 1 1 0 1 0 1 0 1 1 0 1 }}} This shows that each column of A can be written as a linear combination of the two columns of the 3 x 2 matrix on the right hand side. This proves that rank(A) <= 2. == TASK 2 (Encodings) == c(1) = 1, c(2) = 01, c(3) = 001, c(4) = 0001, ... === 2.1 Decode 110000100111001 + why prefix-free === Integer sequence encoded: 1, 1, 5, 3, 1, 1, 3 All valid codes end in 1. Any non-empty prefix of any of the codes ends with a zero. Therefore, no prefix of any of the code is a valid code. === 2.2 Function decode === {{{ def decode(bits): x = 0 result = [] for bit in bits: x += 1 if bit == 1: result.append(x) x = 0 return result if x == 0 else [] }}} === 2.3 Random code, 0 or 1 with probability 1/2 each until 1 === The code for x has exactly x bits. The probability of generating exactly those x bits with the process described is 1/2 * ... * 1/2 (x times), that is, 2 to the -x. === 2.4 Entropy-optimality for this probability distribution === Let L_i be the length of the code for integer i. Then L_i = i. The probability p_i of generating that code is 2^-i (according to Task 2.3). For entropy-optimality, we need L_i <= log_2 (1/p_i) + 1. The left hand side is i, the right hand side is i + 1, so this is true. == TASK 3 (Web applications and UTF-8) == {{{ $(document).ready(function() { $("#query").keyup(function() { var query = $("#query").val(); $.get("http://www.gugel.de:8888/?q="+ query, function(result) { $("#result").html(result); }) }) }) }}} === 3.1 Simple HTML that makes sense for this JavaScript (without head part) === {{{

}}} === 3.2 First fifteen characters of GET requests when typing "hey" + to which machine === {{{ GET /?q=h HTTP/ GET /?q=he HTTP GET /?q=hey HTT }}} The requests are sent to the machine with the address www.gugel.de === 3.3 Number of valid UTF-8 sequences of length 2 === Valid UTF-8 sequences of length 2 look like this {{{ 110xxxxx 10yyyyyy }}} where the 11-bit codepoint xxxxxyyyyyy encodes an integer that must be > 127 (otherwise a one-byte UTF-8 must be used). The number of such codepoints is 2048 - 128 = 1920. === 3.4 UTF-8 code and URL-escaped code of ü === {{{ ISO-8859-1 code in binary: 11111100 With padding, so that 11 bits: 00011111100 As UTF-8 code: 11000011 10111100 (Byte 1: 110 + first five bits of codepoint, Byte 2: 10 + last five bits) In decimal: 195 188 In hexadecimal: C3 BC URL-escaped: %C3%BC (UTF-8) or %FC (ISO-8859-1) ... both count as correct, since the task formulation was ambiguous in this respect. }}} == TASK 4 (Perceptrons) == Training set: {1, 2, 3, 4} (negative), {5, 6, 7, 8} (positive) === 4.1 First round of Perceptron algorithm === {{{ x = 1: w * x - b = 0 --> update: w = 0 - 1 = -1, b = 0 + 1 = 1 x = 2: w * x - b = -3 --> no update x = 3: w * x - b = -4 --> no update x = 4: w * x - b = -5 --> no update x = 5: w * x - b = -6 --> update: w = -1 + 5 = 4, b = 1 - 1 = 0 x = 6: w * x - b = 24 --> no update x = 7: w * x - b = 28 --> no update x = 8: w * x - b = 32 --> no update }}} === 4.2 Function perceptron === {{{ def perceptron(): (w, b) = (0, 0) for round in range(50): for x in [1, 2, 3, 4, 5, 6, 7, 8]: if x <= 4 and w * x - b >= 0: (w, b) = (w - x, b + 1) if x >= 5 and w * x - b <= 0: (w, b) = (w + x, b - 1) return (w, b) }}} === 4.3 Prove ww* = - w* and bb* = -b when training labels swapped === Without loss of generality assume that b is part of the w vector (like we did in the lecture). The argument below can also easily be extended to an explicit b, it's just more wordy than. Let w,,i,, and ww,,i,, be defined as in the hint. Then we can prove by induction that ww,,i,, = -w,,i,,: In the beginning, w,,0,, = ww,,0,, = 0, so correct. Assume that ww,,i,, = -w,,i,, after iteration i (for i = 0: in the beginning) and look at what happens in iteration i+1. Then ww,,i,, * x = - w,,i,, * x. In particular, if one is > 0 the other is < 0 and vice versa, and if one is zero the other is zero too. There are four cases: 1. x negative in original, no update --> w,,i,, * x > 0 --> ww,,i,, * x < 0 --> no update in variant (where x is positive). 2. x positive in original, no update: analogous. 3. x negative in original, update --> w,,i,, * x >= 0 and w,,i+1,, = w,,i,, - x --> in variant ww,,i,, * x <= 0 and update ww,,i+1,, = ww,,i,, + x = -w,,i,, + x = - (w,,i,, - x) = - w,,i+1,,. 4. x positive in original, update w,,i+1,, = w,,i,, + x: analogous. == TASK 5 (Linear algebra) == D,,1,,: a D,,2,,: aabcc D,,3,,: bb D,,4,,: c === 5.1 idf scores for a, b, c and term-document matrix A === Each word occurs in exactly two of the four documents, so the idf of each is log,,2,, (4/2) = 1. The tf.idf term-document matrix A is hence: {{{ 1 2 0 0 0 1 2 0 0 2 0 1 }}} === 5.2 Similarities with C1: a b c and C2: b b === The vector for C,,1,, is (1, 1, 1) and the vector for C,,2,, is (0, 2, 0). We can compute the similarity matrix as: {{{ 1 2 0 0 1 1 1 * 0 1 2 0 = 1 5 2 1 0 2 0 0 2 0 1 0 2 4 0 }}} Hence D,,1,,, D,,2,,, D,,4,, are more similar to C,,1,, and D,,3,, is more similar to C,,2,,. === 5.3 Average for documents closer to C1 and C2, respectively === {{{ 1 2 0 0 1/3 0 0 1 2 0 * 1/3 0 0 2 0 1 0 1 1/3 0 }}} === 5.4 Function matrix_mult === Note: if the two matrices A, B are compatible for multiplication, then num_cols(A) = num_rows(B). {{{ def matrix_mult(A, B): C = zeroes(num_rows(A), num_cols(B)) for i in range(num_rows(A)): for j in range(num_cols(B)): for k in range(num_cols(A)): C[i][j] += A[i][k]*B[k][j] return C }}} == TASK 6 (Miscellaneous) == === 6.1 Purpose of the padding in fuzzy search === The padding ensures that each position is relevant for the same number of q-grams, which permits a stronger lower bound. Without padding, the positions at the beginning or end of the string would be relevant for fewer q-grams and we would have to work with weaker lower bounds. For example, "abc" and "xbc" have edit distance 1 but would have no 3-grams in common without padding. That is, the lower bound on the number of common 3-grams for edit distance 1 would be 0 and we would have to check all strings. With padding, they have "bc$" and "c$$" in common, and, indeed, the lower bound according to our formula is 2. More padding would be meaningless, because it would add $-only q-grams. (This observation was not necessary to get all points.) === 6.2 Sum of number from two independent dice rolls === Let X,,i,, be the number of the i-th roll. Then Pr(S = even) = Pr(X,,1,, = odd and X,,2,, = odd) + Pr(X,,1,, = even and X,,2,, = even) = 1/2 * 1/2 + 1/2 * 1/2 = 1/2. Pr(S = 12 | E) = Pr (S = 12 and E) / Pr(E) = Pr(S = 12) / Pr(E) = (1/36) / (1/2) = 1/18 === 6.3 When is the entropy for two states maximized === Let p = Pr(X = x,,1,,) and hence 1 - p = Pr(X = x,,2,,). Then H(X) = - p * log,,2,, p - (1 - p) * log,,2,, (1 - p). We want to maximize H(X), so we can also look at - p * ln p - (1 - p) * ln (1 - p) ... the difference is only a constant factor (see hint) and the derivation is slightly easier to compute. The derivative is - (1 + ln p) + (1 + ln (1-p)) = - ln p + ln (1 - p) ... see hint, or use the product formula. This is zero if and only if p = 1/2. By checking any other value for p, we see that we have a maximum at p = 1/2. === 6.4 SPARQL query for "actors and actresses from films with a european director === {{{ SELECT ?p WHERE { ?p film ?f . ?d director ?f . ?d nationality ?c . Europe contains ?c } }}}