Size: 5828
Comment:
|
Size: 7301
Comment:
|
Deletions are marked like this. | Additions are marked like this. |
Line 1: | Line 1: |
#acl Claudius Korzen:read,write Patrick Brosi:read,write Axel Lehmann:read,write Markus Näther:read,write -All:read == TASK 1 (Ranking and evaluation) == === 1.1 Term-document matrix === The idf for first and second is log_2(4/1) = 2, the idf for the other words is log_2 (4/2) = 1. Hence {{{ first 2 0 0 0 second 0 2 0 0 nice 1 0 1 0 document 1 1 0 0 text 0 0 1 1 more 0 0 1 1 }}} === 1.2 Function for conversion from tf to tf.idf === {{{ 1. def tf_to_tf_idf(A, m, n): 2. for i in range(m): 3. df = 0 4. for j in range(n): 5. if A[i, j] > 0: 6. df += 1 7. for j in range(n): 8. A[i, j] *= log2(n/df) }}} === 1.3 Find query with P@2 and AP = 50% === {{{ Query: second nice document -> scores: D1 = 2, D2 = 3, D3 = 1, D4 = 0. Order: D2, D1, D3, D4 Relevant: D1 and D4 -> then P@2 and P@4 are 50%, and hence also AP }}} == TASK 2 (Encodings) == === 2.1 Encoding for x = 1, ...,10 === {{{ 1 = 100. 2 = 101.0 3 = 101.1 4 = 110.00 5 = 110.01 6 = 110.10 7 = 110.11 8 = 111.000 9 = 111.001 10 = 111.010 }}} === 2.2 Function for code for x < 16 === {{{ 1. def code(x): 2. result = [1] 3. length = floor(log2(x)) 4. result.extend([length // 2, length % 2]) 5. if x >= 2 and x < 4: result.append(x % 2) 7. if x >= 4 and x < 8: result.extend([(x & 2) >> 1, x % 2]) 8. if x >= 8 and x < 16: result.extend([(x & 4) >> 2, (x & 2) >> 1, x % 2]) 9. return result }}} === 2.3 Formula for code length for arbitrary x === {{{ Length of Golomb part: floor(floor(log_2 x) / 4) + 3 Length of binary part: floor(log_2 x) Sum of the two: floor(floor(log_2 x) / 4) + 3 + floor(log_2 x) }}} == TASK 3 (Web applications and UTF-8) == === 3.1 Write HTML === {{{ <html> <head><script src="convert.js"></script></head> <body>Centimetres: <input id="cm"/>Inches: <input id="in"/></body> </html> }}} === 3.2 Write JavaScript === {{{ $(document).ready(function(){ $("#cm").keyup(function(){ $("#in").val($("#cm).val() / 2.54)); }) $("#in").keyup(function(){ $("#cm").val($("#in).val() * 2.54)); }) }) }}} === 3.3 UTF-8 code for Euro sign === {{{ 32 in binary is 0010.0000, 172 in binary is 1010.1100 (128 + 32 + 8 + 4) The code point in binary is hence: 0010.0000.1010.1100 We hence need a 3-byte code of the form 1110 xxxx 10 yyyyyy 10 zzzzzz (with a 16-bit code point xxxxyyyyyyzzzzzz) The 3-byte UTF-8 code is hence: 1110 0010 10 000010 10 101100 }}} === 3.4 Function for counting #characters in UTF-8 sequence === {{{ 1. def count_utf8_chars(bytes): 2. count = 0 3. for byte in bytes: 4. # Count all except the "follow bytes" of the form "10......". 5. if (byte & (128 + 64) != 128): 6. count += 1 7. return count }}} == TASK 4 (Naive Bayes and k-means) == === 4.1 Steps of k-means === {{{ Points: P1 = (3, 1), P2 = (1, 2), P3 = (1, 4) Initial centroids: C1 = (3, 0), C2 = (0, 3) Step 1a: C1 <- P1, C2 <- P2, P3 Step 1b: C1 = (3, 1), C2 = (1, 3) Step 2a: C1 <- P1, C2 <- P2, P3 Step 2b: C1 = (3, 1), C2 = (1, 3) Centroid identical to those from previous round -> STOP }}} === 4.2 Compute centroids from A and P === {{{ 1. def centroids(A, P): 2. return A.normalizecolumnsL1().transpose() * P }}} === 4.3 Determine w and b of Naive Bayes === {{{ Learned probabilities: p_1X = 3/4, p_2X = 1/4, p_1Y = 1/4, p_2Y = 3/4, p_X = 1/3, p_Y = 2/3 Linear classifier: w = (log_2 (p_1X/p_1Y), log_2 (p_2X/p_2Y)) = (log_2 3, -log_2 3) Linear classifier: b = - log_2 (p_X/p_Y) }}} === 4.4 Example such that Naive Bayes decides 2x > y === Solution for x > y: {{{ Choose P1 = (2, 1) --> X and P2 = (1, 2) --> Y Then p_1X = 2/3, p_2X = 1/3, p_1Y = 1/3, p_2Y = 2/3 Then w = (1, -1) and b = 0 Then (w1, w2) * (x, y) - b > 0 <=> x > y. }}} == TASK 5 (Latent Semantic Indexing) == === 5.1 Show that V is row-orthonormal === {{{ Product without factor 1/4 * sqrt(2) is: [[8, 0], [0, 8]] 1/4 * sqrt(2) * 1/4 * sqrt(2) = 1/16 * 2 = 1/8 This gives indeed the identity matrix }}} === 5.2 Compute missing S === {{{ A * At = [[40, 24], [24, 64]] A * At * [1, 1]t = [64, 64]t -> Eigenvalue 64 A * At * [1, -11]t = [16, -16]t -> Eigenvalue 16 The singular values are hence 8 and 4 S is hence diag(8, 4) }}} === 5.3 Rank of A and making it rank 3 === {{{ A has two independent columns -> rank >= 2 A has two rows -> rank <= 2 Since the number of rows is an upper bound, adding another column cannot make the rank 3 }}} === 5.4 Function for L2-normalization of a vector === {{{ 1. def normalize(vector): 2. sum = 0 3. for x in vector: 4. sum += x * x 5. for x in vector: 6. x = x / sqrt(sum) }}} == TASK 6 (Miscellaneous) == === 6.1 Total number of items in a 3-gram index === {{{ Each 3-gram of each word contributed one item. A word v has |v| + 2 3-grams. Hence the total number of items is sum_i |v_i| + 2 * n. }}} === 6.2 SQL query for persons who founded company in city they were born in === {{{ SELECT founded_by.person FROM founded_by, based_in, born_in WHERE founded_by.person = born_in.person AND founded_by.company = based_in.company AND based_in.city = born_in.city }}} === 6.3 Maximum entropy of distribution given by p1, ..., pn === {{{ Entropy is sum_i -p_i * log2 p_i Optimize sum_i -p_i * ln p_i (the constant factor log / ln does not change the position of the optimum) First derivative: -1 - ln p_i + lambda = 0 => all p_i equal and hence 1/n Second derivative: -1/p_i < 0 => MAX Resulting entropy: sum_i -1/n * log2 (1/n) = log2 n }}} |
#acl Claudius Korzen:read,write Patrick Brosi:read,write Axel Lehmann:read,write Markus Naether:read,write Niklas Schnelle:read,write -All:read == TASK 1 (Ranking and evaluation, Claudius) == === 1.1 P@2, P@R, AP === {{{ P@2 = 1/2 P@R = 1/3 P@R1 = 1/2, P@R2 = 1/2, P@R3 = 0, AP = (1/2 + 1/2 + 0) / 3 = 1/3 }}} === 1.2 DCG@2, iDCG@2, nDCG@2 === {{{ DCG@2 = 0 + 3 / 1 = 3 iDCG@2 = 3 + 2 / 1 = 5 nDCG@2 = 3/5 = 60% }}} === 1.3 Function dcg that computes DCG@k === {{{ 1. def dcg(rels, k): 2. sum = rels[0] 3. for i in range(1, k): 4. sum += rels[i] / log2(i) 5. return sum }}} === 1.4 Function ndcg that computes nDCG@k === {{{ 1. def ndcg(rels, k): 2. dcg_val = dcg(rels, k) 3. sorted_rels = sort(rels, DESC) 4. idcg_val = dcg(sorted_rels, k) 5. return dcg_val / idcg_val }}} == TASK 2 (Encodings, Niklas) == === 2.1 Function unary_encode that encodes given sequence of integers === {{{ 1. def unary_encode(input): 2. result = [] 3. for x in input: 4. for i in range(0, x-1): 5. result.add(0) 6. result.add(1) 7. return result }}} === 2.2 Expected number of bits in sequence of n integers i.i.d from {1, ..., n} === {{{ 1. The code length for integer i is exactly i 2. For a random integer from {1, ..., n}, the expected code length is sum_i (i * 1/n) = 1/n * sum_i i = 1/n * 1/2 * n * (n+1) = (n+1) / 2 3. The expected number of bits in the whole sequence is hence n * (n+1) / 2 }}} === 2.3 Is this encoding entropy-optimal for the distribution from task 2.2 === {{{ 1. The entropy H of the distribution from 2.2 is -sum_i 1/n * log_2 (1/n) = 1/n * log_2 n * sum_i 1 = log_2 n 2. The expected code length is (n+1)/2 3. For entropy-optimality, we should have expected code length = entropy + O(1) 4. So the answer is NO. }}} === 2.4 Name a prefix-free entropy-optimal encoding for the distribution from task 2.2 === {{{ 1. Just encode each integer with a fixed number of ceil(log_2 n) bits 2. This is prefix-free, because for two different codes of the same length, none can be the prefix of the other 3. Since ceil(log_2 n) <= log_2 + 1, this encoding is entropy-optimal }}} == TASK 3 (Fuzzy search and UTF-8, Axel) == === 3.1 (5 points) ED(x, y), ED(y, x), PED(x, y), PED(y, x) for x = cute and y = computer === {{{ ED(cute, computer) = 4 ... insert omp and r ED(computer, cute) = 4 ... ED is symmetric PED(cute, computer) = ED(cute, compute) = 3 PED(computer, cute) = ED(computer, cute) = 4 }}} === 3.2 (5 points) Find x and y with ED(x, y) = 2 and comm($$x$$, $$y$$) = max{|x|, |y|} - 4 === {{{ 1. max{|x|, |y|} - 4 is the lower bound from the lecture for q = 3 and ED(x, y) = 2 2. The lower bound is achieved when destroying the maximal number of q-grams with each edit distance operation, in this case 3 3. For example, x = AXXA, y = BXXB }}} === 3.3 (10 points) Function valid_utf8_char8 that checks if a given array of three bytes is valid UTF-8 for a single character === {{{ 1. def valid_utf8_char(bytes): # Check that first byte start with 1110 and the other two with 10 # 10000000b = 128, 11000000b = 192, 11100000b = 224, 11110000b = 240 3. if (bytes[0] & 240 != 224 OR bytes[1] & 192 != 128 OR bytes[2] & 192 != 128): 4. return False # Check that the code point is >= 2048 # 11100000b = 224, 10100000b = 160 5. if (bytes[0] > 224 OR bytes[1] >= 160): 6. return True 7. return False }}} == TASK 4 (Naive Bayes, Patrick) == === 4.1 (10 points) Function naive_bayes_predict that predicts the class for a given document === {{{ 1. def naive_bayes_predict(pc, pwc, words): 2. best_c, bestp = -1, -1 3. for c in range(0, len(pc)): 4. p = pc[c] 5. for w in words: 6. p *= pwc[w][c] 7. if p > best_p: 8. best_p, best_c = p, c 9. return best_c }}} === 4.2 (5 points) Same prediction for d_1 and d_2 = d_1.d_1 === {{{ 1. If p(c | d1) = P_c * p_c, then p(c | d2) = P_c² * p_c. 2. If the p_c are all equal, then p(c1 | d1) < p(c2 | d1) <=> P_c1 < P_c2 <=> P_c1² < P_c2² <=> P(c1 | d2) < p(c1 | d2) }}} === 4.3 (5 points) Also true if the pc are not all equal === {{{ 1. Assume two classes c1 and c2 with p(c1) = 1/4 and p(c2) = 3/4 2. Let d1 consist of one word w with p(c1, w) = 2/3 and p(c2, w) = 1/3 3. Then p(c1 | d1) = 2/3 * 1/4 = 1/6 and p(c2 | d1) = 1/3 * 3/4 = 1/4 => class c2 is more likely 4. But p(c1 | d2) = 2/3 * 2/3 * 1/4 = 1/9 and p(c2 | d2) = 1/3 * 1/3 * 3/4 = 1/12 => class c1 is more likely }}} == TASK 5 (Latent Semantic Indexing, Markus) == === 5.1 (10 points) Determine the matrices U and S of the SVD of A === {{{ 1. The product A * AT is [[15, 9], [9, 15]] 2. One eigenvector is easily seen to be (1, 1) with eigenvalue 9 * 15 = 24 3. The other eigenvector is then (1, -1) with eigenvalue 6 4. The matrix U is hence 1/2 * sqrt(2) * [[1, 1], [1, -1]] 5. The matrix S is diag(sqrt(24), sqrt(6)) = sqtr(6) * diag(2, 1) }}} === 5.2 Function compute_V that computes V from A, U, S === {{{ 1. def compute_V(A, U, S, V, m, n): 2. r = len(S) # A has shape m x n, U has shape m x r, S has r entries # We can assume that V has shape r x n and is filled with zeroes # According to ES10, V = S^-1 * UT * A 3. for i in range(0, r): 4. for j in range(0, n): 5. for k in range(0, m): 6. V[i, j] += U[k, i] * A[k, j] / S[i] }}} == TASK 6 (Hidden Markov Model, Hannah) == === 6.1 (5 points) Probability that second state is plus if sequence is good, good, good === {{{ 1. Pr(h1 = plus) * Pr(o1 = good | h1 = plus) * Pr(h2 = plus | h1 = plus) * Pr(o2 = good | h2 = plus) = 1/2 * 1/3 * 1/3 * 1/3 = 1/54 2. Pr(h1 = minus) * Pr(o1 = good | h1 = minus) * Pr(h2 = plus | h1 = minus) * Pr(o2 = good | h2 = plus) = 1/2 * 2/3 * 2/3 * 1/3 = 4/54 3. Pr(h2 = plus) = 1/54 + 4/54 = 5/54 }}} === 6.2 (5 points) Most likely sequence of hidden states for great, bad, great === {{{ 1. Pr(great | minus) = 0 and Pr(bad | plus) = 0 2. Hence the only sequence of hidden states with non-zero probability is plus, minus, plus 3. This is hence the most likely sequence of hidden states }}} === 6.3 (5 points) Function compute_likelihood that computes likelihood of a given sequence of observations and hidden states === {{{ 1. def compute_likelihood(o, h): 2. result = 1 3. for i in range(0, len(o)): 4. result *= iprob[i] if i == 0 else tprob[h[i], h[i-1]] 5. result *= eprob[h[i], o[i]] 6. return result }}} === 6.4 (5 points) Function brute_force_hmm that computes the most likely sequence of hidden states for given sequence of three observations === {{{ 1. def brute_force_hmm(o1, o2, o3): 2. k = len(iprob) # Number of hidden states 3. best_l, best_h = 0, [] 4. for h1 in range(0, k): 5. for h2 in range(0, k): 6. for h3 in range(0, k): 7. l = compute_likelihood([o1, o2, o3], [h1, h2, h3]) 8. if l > best_l: 9. best_l, best_h = l, [h1, h2, h3] 10. return best_h }}} |
TASK 1 (Ranking and evaluation, Claudius)
1.1 P@2, P@R, AP
P@2 = 1/2 P@R = 1/3 P@R1 = 1/2, P@R2 = 1/2, P@R3 = 0, AP = (1/2 + 1/2 + 0) / 3 = 1/3
1.2 DCG@2, iDCG@2, nDCG@2
DCG@2 = 0 + 3 / 1 = 3 iDCG@2 = 3 + 2 / 1 = 5 nDCG@2 = 3/5 = 60%
1.3 Function dcg that computes DCG@k
1. def dcg(rels, k): 2. sum = rels[0] 3. for i in range(1, k): 4. sum += rels[i] / log2(i) 5. return sum
1.4 Function ndcg that computes nDCG@k
1. def ndcg(rels, k): 2. dcg_val = dcg(rels, k) 3. sorted_rels = sort(rels, DESC) 4. idcg_val = dcg(sorted_rels, k) 5. return dcg_val / idcg_val
TASK 2 (Encodings, Niklas)
2.1 Function unary_encode that encodes given sequence of integers
1. def unary_encode(input): 2. result = [] 3. for x in input: 4. for i in range(0, x-1): 5. result.add(0) 6. result.add(1) 7. return result
2.2 Expected number of bits in sequence of n integers i.i.d from {1, ..., n}
1. The code length for integer i is exactly i 2. For a random integer from {1, ..., n}, the expected code length is sum_i (i * 1/n) = 1/n * sum_i i = 1/n * 1/2 * n * (n+1) = (n+1) / 2 3. The expected number of bits in the whole sequence is hence n * (n+1) / 2
2.3 Is this encoding entropy-optimal for the distribution from task 2.2
1. The entropy H of the distribution from 2.2 is -sum_i 1/n * log_2 (1/n) = 1/n * log_2 n * sum_i 1 = log_2 n 2. The expected code length is (n+1)/2 3. For entropy-optimality, we should have expected code length = entropy + O(1) 4. So the answer is NO.
2.4 Name a prefix-free entropy-optimal encoding for the distribution from task 2.2
1. Just encode each integer with a fixed number of ceil(log_2 n) bits 2. This is prefix-free, because for two different codes of the same length, none can be the prefix of the other 3. Since ceil(log_2 n) <= log_2 + 1, this encoding is entropy-optimal
TASK 3 (Fuzzy search and UTF-8, Axel)
3.1 (5 points) ED(x, y), ED(y, x), PED(x, y), PED(y, x) for x = cute and y = computer
ED(cute, computer) = 4 ... insert omp and r ED(computer, cute) = 4 ... ED is symmetric PED(cute, computer) = ED(cute, compute) = 3 PED(computer, cute) = ED(computer, cute) = 4
3.2 (5 points) Find x and y with ED(x, y) = 2 and comm($$x$$, $$y$$) = max{|x|, |y|} - 4
1. max{|x|, |y|} - 4 is the lower bound from the lecture for q = 3 and ED(x, y) = 2 2. The lower bound is achieved when destroying the maximal number of q-grams with each edit distance operation, in this case 3 3. For example, x = AXXA, y = BXXB
3.3 (10 points) Function valid_utf8_char8 that checks if a given array of three bytes is valid UTF-8 for a single character
1. def valid_utf8_char(bytes): # Check that first byte start with 1110 and the other two with 10 # 10000000b = 128, 11000000b = 192, 11100000b = 224, 11110000b = 240 3. if (bytes[0] & 240 != 224 OR bytes[1] & 192 != 128 OR bytes[2] & 192 != 128): 4. return False # Check that the code point is >= 2048 # 11100000b = 224, 10100000b = 160 5. if (bytes[0] > 224 OR bytes[1] >= 160): 6. return True 7. return False
TASK 4 (Naive Bayes, Patrick)
4.1 (10 points) Function naive_bayes_predict that predicts the class for a given document
1. def naive_bayes_predict(pc, pwc, words): 2. best_c, bestp = -1, -1 3. for c in range(0, len(pc)): 4. p = pc[c] 5. for w in words: 6. p *= pwc[w][c] 7. if p > best_p: 8. best_p, best_c = p, c 9. return best_c
4.2 (5 points) Same prediction for d_1 and d_2 = d_1.d_1
1. If p(c | d1) = P_c * p_c, then p(c | d2) = P_c² * p_c. 2. If the p_c are all equal, then p(c1 | d1) < p(c2 | d1) <=> P_c1 < P_c2 <=> P_c1² < P_c2² <=> P(c1 | d2) < p(c1 | d2)
4.3 (5 points) Also true if the pc are not all equal
1. Assume two classes c1 and c2 with p(c1) = 1/4 and p(c2) = 3/4 2. Let d1 consist of one word w with p(c1, w) = 2/3 and p(c2, w) = 1/3 3. Then p(c1 | d1) = 2/3 * 1/4 = 1/6 and p(c2 | d1) = 1/3 * 3/4 = 1/4 => class c2 is more likely 4. But p(c1 | d2) = 2/3 * 2/3 * 1/4 = 1/9 and p(c2 | d2) = 1/3 * 1/3 * 3/4 = 1/12 => class c1 is more likely
TASK 5 (Latent Semantic Indexing, Markus)
5.1 (10 points) Determine the matrices U and S of the SVD of A
1. The product A * AT is [[15, 9], [9, 15]] 2. One eigenvector is easily seen to be (1, 1) with eigenvalue 9 * 15 = 24 3. The other eigenvector is then (1, -1) with eigenvalue 6 4. The matrix U is hence 1/2 * sqrt(2) * [[1, 1], [1, -1]] 5. The matrix S is diag(sqrt(24), sqrt(6)) = sqtr(6) * diag(2, 1)
5.2 Function compute_V that computes V from A, U, S
1. def compute_V(A, U, S, V, m, n): 2. r = len(S) # A has shape m x n, U has shape m x r, S has r entries # We can assume that V has shape r x n and is filled with zeroes # According to ES10, V = S^-1 * UT * A 3. for i in range(0, r): 4. for j in range(0, n): 5. for k in range(0, m): 6. V[i, j] += U[k, i] * A[k, j] / S[i]
TASK 6 (Hidden Markov Model, Hannah)
6.1 (5 points) Probability that second state is plus if sequence is good, good, good
1. Pr(h1 = plus) * Pr(o1 = good | h1 = plus) * Pr(h2 = plus | h1 = plus) * Pr(o2 = good | h2 = plus) = 1/2 * 1/3 * 1/3 * 1/3 = 1/54 2. Pr(h1 = minus) * Pr(o1 = good | h1 = minus) * Pr(h2 = plus | h1 = minus) * Pr(o2 = good | h2 = plus) = 1/2 * 2/3 * 2/3 * 1/3 = 4/54 3. Pr(h2 = plus) = 1/54 + 4/54 = 5/54
6.2 (5 points) Most likely sequence of hidden states for great, bad, great
1. Pr(great | minus) = 0 and Pr(bad | plus) = 0 2. Hence the only sequence of hidden states with non-zero probability is plus, minus, plus 3. This is hence the most likely sequence of hidden states
6.3 (5 points) Function compute_likelihood that computes likelihood of a given sequence of observations and hidden states
1. def compute_likelihood(o, h): 2. result = 1 3. for i in range(0, len(o)): 4. result *= iprob[i] if i == 0 else tprob[h[i], h[i-1]] 5. result *= eprob[h[i], o[i]] 6. return result
6.4 (5 points) Function brute_force_hmm that computes the most likely sequence of hidden states for given sequence of three observations
1. def brute_force_hmm(o1, o2, o3): 2. k = len(iprob) # Number of hidden states 3. best_l, best_h = 0, [] 4. for h1 in range(0, k): 5. for h2 in range(0, k): 6. for h3 in range(0, k): 7. l = compute_likelihood([o1, o2, o3], [h1, h2, h3]) 8. if l > best_l: 9. best_l, best_h = l, [h1, h2, h3] 10. return best_h