AD Teaching Wiki:

TASK 1 (Ranking and evaluation)

1.1 P@2, P@R, AP

P@2 = 1/2
P@R = 1/3
P@R1 = 1/2, P@R2 = 1/2, P@R3 = 0, AP = (1/2 + 1/2 + 0) / 3 = 1/3

1.2 DCG@2, iDCG@2, nDCG@2

DCG@2 = 0 + 3 / 1 = 3
iDCG@2 = 3 + 2 / 1 = 5
nDCG@2 = 3/5 = 60%

1.3 Function dcg that computes DCG@k

1. def dcg(rels, k):
2.     sum = rels[0]
3.     for i in range(1, k): 
4.         sum += rels[i] / log2(i+1)
5.     return sum

1.4 Function ndcg that computes nDCG@k

1. def ndcg(rels, k):
2.    dcg_val = dcg(rels, k)
3.    sorted_rels = sort(rels, DESC)
4.    idcg_val = dcg(sorted_rels, k)
5.    return dcg_val / idcg_val

TASK 2 (Encodings)

2.1 Function unary_encode that encodes given sequence of integers

1. def unary_encode(input):
2.     result = []
3.     for x in input:
4.         for i in range(0, x-1):
5.             result.add(0)
6.         result.add(1)
7.     return result

2.2 Expected number of bits in sequence of n integers i.i.d from {1, ..., n}

1. The code length for integer i is exactly i
2. For a random integer from {1, ..., n}, the expected code length is sum_i (i * 1/n) = 1/n * sum_i i = 1/n * 1/2 * n * (n+1) = (n+1) / 2
3. The expected number of bits in the whole sequence is hence n * (n+1) / 2 

2.3 Is this encoding entropy-optimal for the distribution from task 2.2

1. The entropy H of the distribution from 2.2 is -sum_i 1/n * log_2 (1/n) = 1/n * log_2 n * sum_i 1 =  log_2 n
2. The expected code length is (n+1)/2
3. For entropy-optimality, we should have expected code length = entropy + O(1)
4. So the answer is NO.

2.4 Name a prefix-free entropy-optimal encoding for the distribution from task 2.2

1. Just encode each integer with a fixed number of ceil(log_2 n) bits
2. This is prefix-free, because for two different codes of the same length, none can be the prefix of the other
3. Since ceil(log_2 n) <= log_2 + 1, this encoding is entropy-optimal

TASK 3 (Fuzzy search and UTF-8)

3.1 (5 points) ED(x, y), ED(y, x), PED(x, y), PED(y, x) for x = cute and y = computer

ED(cute, computer) = 4 ... insert omp and r
ED(computer, cute) = 4 ... ED is symmetric
PED(cute, computer) = ED(cute, compute) = 3
PED(computer, cute) = ED(computer, cute) = 4

3.2 (5 points) Find x and y with ED(x, y) = 2 and comm($$x$$, $$y$$) = max{|x|, |y|} - 4

1. max{|x|, |y|} - 4 is the lower bound from the lecture for q = 3 and ED(x, y) = 2
2. The lower bound is achieved when destroying the maximal number of q-grams with each edit distance operation, in this case 3
3. For example, x = AXXA, y = BXXB

3.3 (10 points) Function valid_utf8_char8 that checks if a given array of three bytes is valid UTF-8 for a single character

1. def valid_utf8_char(bytes):
       # Check that first byte start with 1110 and the other two with 10
       # 10000000b = 128, 11000000b = 192, 11100000b = 224, 11110000b = 240
3.     if (bytes[0] & 240 != 224 OR bytes[1] & 192 != 128 OR bytes[2] & 192 != 128):
4.         return False 
       # Check that the code point is >= 2048
       # 11100000b = 224, 10100000b = 160 
5.     if (bytes[0] > 224 OR bytes[1] >= 160):
6.         return True
7.     return False

TASK 4 (Naive Bayes)

4.1 (10 points) Function naive_bayes_predict that predicts the class for a given document

1. def naive_bayes_predict(pc, pwc, words):
2.    best_c, bestp = -1, -1
3.    for c in range(0, len(pc)):
4.        p = pc[c]
5.        for w in words:
6.            p *= pwc[w][c]
7.        if p > best_p:
8.            best_p, best_c = p, c
9.    return best_c

4.2 (5 points) Same prediction for d_1 and d_2 = d_1.d_1

1. If p(c | d1) = P_c * p_c, then p(c | d2) = P_c² * p_c.
2. If the p_c are all equal, then p(c1 | d1) < p(c2 | d1) <=> P_c1 < P_c2 <=> P_c1² < P_c2² <=> p(c1 | d2) < p(c1 | d2)

4.3 (5 points) Is the statement from 4.2 also true if the pc are not all equal?

1. Assume two classes c1 and c2 with p(c1) = 1/4 and p(c2) = 3/4
2. Let d1 consist of one word w with p(c1, w) = 2/3 and p(c2, w) = 1/3
3. Then p(c1 | d1) = 2/3 * 1/4 = 1/6 and p(c2 | d1) = 1/3 * 3/4 = 1/4  => class c2 is more likely
4. But p(c1 | d2) = 2/3 * 2/3 * 1/4 = 1/9 and p(c2 | d2) = 1/3 * 1/3 * 3/4 = 1/12  => class c1 is more likely
5. So the statement from 4.2 is not true in general if the pc are not all equal

TASK 5 (Latent Semantic Indexing)

5.1 (10 points) Determine the matrices U and S of the SVD of A

1. The product A * AT is [[15, 9], [9, 15]]
2. One eigenvector is easily seen to be (1, 1) with eigenvalue 9 * 15 = 24
3. The other eigenvector is then (1, -1) with eigenvalue 6
4. The matrix U is hence 1/2 * sqrt(2) * [[1, 1], [1, -1]]
5. The matrix S is diag(sqrt(24), sqrt(6)) = sqtr(6) * diag(2, 1)

5.2 Function compute_V that computes V from A, U, S

1. def compute_V(A, U, S, V, m, n):
2.     r = len(S)
       # A has shape m x n, U has shape m x r, S has r entries 
       # We can assume that V has shape r x n and is filled with zeroes
       # According to ES10, V = S^-1 * UT * A
3.     for i in range(0, r):
4.         for j in range(0, n):
5.             for k in range(0, m):
6.                 V[i, j] += U[k, i] * A[k, j] / S[i]

TASK 6 (Hidden Markov Model)

6.1 (5 points) Probability that second state is plus if sequence is good, good, good

1. Pr(h1 = plus) * Pr(o1 = good | h1 = plus) * Pr(h2 = plus | h1 = plus) * Pr(o2 = good | h2 = plus) = 1/2 * 1/3 * 1/3 * 1/3 = 1/54
2. Pr(h1 = minus) * Pr(o1 = good | h1 = minus) * Pr(h2 = plus | h1 = minus) * Pr(o2 = good | h2 = plus) = 1/2 * 2/3 * 2/3 * 1/3 = 4/54
3. Pr(h2 = plus) = 1/54 + 4/54 = 5/54

6.2 (5 points) Most likely sequence of hidden states for great, bad, great

1. Pr(great | minus) = 0 and Pr(bad | plus) = 0
2. Hence the only sequence of hidden states with non-zero probability is plus, minus, plus
3. This is hence the most likely sequence of hidden states

6.3 (5 points) Function compute_likelihood that computes likelihood of a given sequence of observations and hidden states

1. def compute_likelihood(o, h):
2.     result = 1
3.     for i in range(0, len(o)):
4.         result *= iprob[i] if i == 0 else tprob[h[i], h[i-1]]
5.         result *= eprob[h[i], o[i]]
6.     return result

6.4 (5 points) Function brute_force_hmm that computes the most likely sequence of hidden states for given sequence of three observations

1. def brute_force_hmm(o1, o2, o3):
2.     k = len(iprob)  # Number of hidden states
3.     best_l, best_h = 0, []
4.     for h1 in range(0, k):
5.         for h2 in range(0, k):
6.             for h3 in range(0, k):
7.                 l = compute_likelihood([o1, o2, o3], [h1, h2, h3])
8.                 if l > best_l:
9.                     best_l, best_h = l, [h1, h2, h3]
10.    return best_h

AD Teaching Wiki: InformationRetrievalWS1718/ExamSolution (last edited 2019-02-05 10:47:37 by Axel Lehmann)