#acl Björn Buchhold:read,write Elmar Haussmann:read,write Claudius Korzen:read,write Sabine Storandt:read,write == TASK 1 (Ranking and evaluation) == {{{ bla 1 1 2 2 3 bli 1 0 1 2 2 blu 0 1 1 0 1 }}} === 1.1 AP for query "bla bli" === {{{ The vector for "bla blu" is simply (1, 0, 1) Dot product similarities for D1, ..., D5: 1, 2, 3, 2, 4 Documents ordered by these similarities: D5, D3, D2 and D4, D1 (D2 and D4 have the same score) Bit array of relevances: 1, 0, 0, 0, 1 Precisions: P@1 = 100%, P@5 = 40% (the others are not needed to compute the AP) Average precision (AP): 70% }}} === 1.2 Function compute_ap === {{{ def compute_ap(relevances): prec_sum = 0 num_rel = 0 for i in range(len(relevances)): if relevances[i] == 1: num_rel += 1 prec = num_rel / (i + 1) prec_sum += prec return prec_sum / num_rel }}} === 1.3 Prove that AP = 100% if and only if all 1s at the front === If all 1s at the front, then each prec in the code above is 100%, and so will be the average. If AP = 100%, then the last prec must be 100%, which can only happen if there is no 0 before the last 1. === 1.4 Write A as product of 3 x 2 and 2 x 5 ... what does this say about rank(A) === {{{ 1 1 2 2 3 1 1 1 0 1 2 2 = 1 0 * 1 0 1 2 2 0 1 1 0 1 0 1 0 1 1 0 1 }}} This shows that each column of A can be written as a linear combination of the two columns of the 3 x 2 matrix on the right hand side. This proves that rank(A) <= 2. == TASK 2 (Encodings) == c(1) = 1, c(2) = 01, c(3) = 001, c(4) = 0001, ... === 2.1 Decode 110000100111001 + why prefix-free === Integer sequence encoded: 1, 1, 5, 3, 1, 1, 3 All valid codes end in 1. Any non-empty prefix of any of the codes ends with a zero. Therefore, no prefix of any of the code is a valid code. === 2.2 Function decode === {{{ def decode(bits): x = 0 result = [] for bit in bits: x += 1 if bit == 1: result.append(x) x = 0 return result if x == 0 else [] }}} === 2.3 Random code, 0 or 1 with probability 1/2 each until 1 === The code for x has exactly x bits. The probability of generating exactly those x bits with the process described is 1/2 * ... * 1/2 (x times), that is, 2 to the -x. === 2.4 Entropy-optimality for this probability distribution === Let L_i be the length of the code for integer i. Then L_i = i. The probability p_i of generating that code is 2^-i (according to Task 2.3). For entropy-optimality, we need L_i <= log_2 (1/p_i) + 1. The left hand side is i, the right hand side is i + 1, so this is true. == TASK 3 (Web applications and UTF-8) == {{{ $(document).ready(function() { $("#query").keyup(function() { var query = $("#query").val(); $.get("http://www.gugel.de:8888/?q="+ query, function(result) { $("#result").html(result); }) }) }) }}} === 3.1 Simple HTML that makes sense for this JavaScript (without head part) === {{{
}}} === 3.2 First fifteen characters of GET requests when typing "hey" + to which machine === {{{ GET /?q=h HTTP/ GET /?q=he HTTP GET /?q=hey HTT }}} The requests are sent to the machine with the address www.gugel.de === 3.3 Number of valid UTF-8 sequences of length 2 === Valid UTF-8 sequences of length 2 look like this {{{ 110xxxxx 10yyyyyy }}} where the 11-bit codepoint xxxxxyyyyyy encodes an integer that must be > 127 (otherwise a one-byte UTF-8 must be used). The number of such codepoints is 2048 - 128 = 1920. === 3.4 UTF-8 code and URL-escaped code of ü === {{{ ISO-8859-1 code in binary: 11111100 With padding, so that 11 bits: 00011111100 As UTF-8 code: 11000011 10111100 (Byte 1: 110 + first five bits of codepoint, Byte 2: 10 + last five bits) In decimal: 195 188 In hexadecimal: C3 BC URL-escaped: %C3%BC (UTF-8) or %FC (ISO-8859-1) ... both count as correct, since the task formulation was ambiguous in this respect. }}} == TASK 4 (Perceptrons) == Training set: {1, 2, 3, 4} (negative), {5, 6, 7, 8} (positive) === 4.1 First round of Perceptron algorithm === {{{ x = 1: w * x - b = 0 --> update: w = 0 - 1 = -1, b = 0 + 1 = 1 x = 2: w * x - b = -3 --> no update x = 3: w * x - b = -4 --> no update x = 4: w * x - b = -5 --> no update x = 5: w * x - b = -6 --> update: w = -1 + 5 = 4, b = 1 - 1 = 0 x = 6: w * x - b = 24 --> no update x = 7: w * x - b = 28 --> no update x = 8: w * x - b = 32 --> no update }}} === 4.2 Function perceptron === {{{ def perceptron(): (w, b) = (0, 0) for round in range(50): for x in [1, 2, 3, 4, 5, 6, 7, 8]: if x <= 4 and w * x - b >= 0: (w, b) = (w - x, b + 1) if x >= 5 and w * x - b <= 0: (w, b) = (w + x, b - 1) return (w, b) }}} === 4.3 Prove ww* = - w* and bb* = -b when training labels swapped === Let w,,i,,, b,,i,, and ww,,i,,, bb,,i,, as in the hint. Then we can prove by induction that ww,,i,, = -w,,i,, and bb,,i,, = -b,,i,,: In the beginning, all values are zero, so correct. Assume that ww,,i,, = -w,,i,, and bb,,i,, = -b,,i,, after iteration i and look at what happens in iteration i+1. Then ww,,i,, * x - bb,ii,, = - (w,,i,, * x - n,,i,,). In particular, if one is > 0 the other is < 0 and vice versa, and if one is zero the other is zero too. There are four cases: 1. x negative in original, no update --> w,,i,, * x - b,,i,, > 0 --> ww,,i,, * x - b,,i,, < 0 --> no update in variant (where x is positive) 2. x positive in original, no update: analogous. 3. x negative in original, update (w,,i+1,,, b,,i+1,,) = (w,,i,, - x, b,,i,, + 1) --> w,,i,, * x - b,,i,, >= 0 --> update in variant (ww,,i+1,,, bb,,i+1,,) = (ww,,i,, + x, bb,,i,, - 1) which is just minus (w,,i+1,,, b,,i+1,,) (since x is positive in variant). 4. x positive in original, update (w,,i+1,,, b,,i+1,,) = (w,,i,, + x, b,,i,, - 1): analogous. == TASK 5 (Linear algebra) == D,,1,,: a D,,2,,: aabcc D,,3,,: bb D,,4,,: c === 5.1 idf scores for a, b, c and term-document matrix A === Each word occurs in exactly two of the four documents, so the idf of each is log,,2,, (4/2) = 1. The tf.idf term-document matrix A is hence: {{{ 1 2 0 0 0 1 2 0 0 2 0 1 }}} === 5.2 Similarities with C1: a b c and C2: b b === The vector for C,,1,, is (1, 1, 1) and the vector for C,,2,, is (0, 2, 0). We can compute the similarity matrix as: {{{ 1 2 0 0 1 1 1 * 0 1 2 0 = 1 5 2 1 0 2 0 0 2 0 1 0 2 4 0 }}} Hence D,,1,,, D,,2,,, D,,4,, are more similar to C,,1,, and D,,3,, is more similar to C,,2,,. === 5.3 Average for documents closer to C1 and C2, respectively === {{{ 1 2 0 0 1/3 0 0 1 2 0 * 1/3 0 0 2 0 1 0 1 1/3 0 }}} === 5.4 Function matrix_mult === Note: if the two matrices A, B are compatible for multiplication, then num_cols(A) = num_rows(B). {{{ def matrix_mult(A, B): C = zeroes(num_rows(A), num_cols(B)) for i in range(num_rows(A)): for j in range(num_cols(B)): for k in range(num_cols(A)): C[i][j] += A[i][k]*B[k][j] return C }}} == TASK 6 (Miscellaneous) == === 6.1 Purpose of the padding in fuzzy search === The padding ensures that each position is relevant for the same number of q-grams, which permits a stronger lower bound. Without padding, the positions at the beginning or end of the string would be relevant for fewer q-grams and we would have to work with weaker lower bounds. For example, "abc" and "xbc" have edit distance 1 but would have no 3-grams in common without padding. That is, the lower bound on the number of common 3-grams for edit distance 1 would be 0 and we would have to check all strings. With padding, they have "bc$" and "c$$" in common, and, indeed, the lower bound according to our formula is 2. More padding would be meaningless, because it would add $-only q-grams. (This observation was not necessary to get all points.) === 6.2 Sum of number from two independent dice rolls === Let X,,i,, be the number of the i-th roll. Then Pr(S = even) = Pr(X,,1,, = odd and X,,2,, = odd) + Pr(X,,1,, = even and X,,2,, = even) = 1/2 * 1/2 + 1/2 * 1/2 = 1/2. Pr(S = 12 | E) = Pr (S = 12 and E) / Pr(E) = Pr(S = 12) / Pr(E) = (1/36) / (1/2) = 1/18 === 6.3 When is the entropy for two states maximized === Let p = Pr(X = x,,1,,) and hence 1 - p = Pr(X = x,,2,,). Then H(X) = - p * log,,2,, p - (1 - p) * log,,2,, (1 - p). We want to maximize H(X), so we can also look at - p * ln p - (1 - p) * ln (1 - p) ... the difference is only a constant factor (see hint) and the derivation is slightly easier to compute. The derivative is - (1 + ln p) + (1 + ln (1-p)) = - ln p + ln (1 - p) ... see hint, or use the product formula. This is zero if and only if p = 1/2. By checking any other value for p, we see that we have a maximum at p = 1/2. === 6.4 SPARQL query for "actors and actresses from films with a european director === {{{ SELECT ?p WHERE { ?p film ?f . ?d director ?f . ?d nationality ?c . Europe contains ?c } }}}