AD Teaching Wiki
  • Comments
  • Immutable Page
  • Menu
    • Navigation
    • RecentChanges
    • FindPage
    • Local Site Map
    • Help
    • HelpContents
    • HelpOnMoinWikiSyntax
    • Display
    • Attachments
    • Info
    • Raw Text
    • Print View
    • Edit
    • Load
    • Save
  • Login

FrontPage

Links

  • Daphne

  • Forum

  • CodingStandards

  • Exam

  • Evaluation

Previous Courses

  • WS 16/17

  • WS 15/16

  • WS 13/14

  • WS 12/13

Upload page content

You can upload content for the page named below. If you change the page name, you can also upload content for another page. If the page name is empty, we derive the page name from the file name.

File to load page content from
Page name
Comment

Revision 2 as of 2018-02-19 13:01:28
AD Teaching Wiki:
  • InformationRetrievalWS1718
  • ExamSolution

TASK 1 (Ranking and evaluation)

1.1 P@2, P@R, AP

P@2 = 1/2
P@R = 1/3
P@R1 = 1/2, P@R2 = 1/2, P@R3 = 0, AP = (1/2 + 1/2 + 0) / 3 = 1/3

1.2 DCG@2, iDCG@2, nDCG@2

DCG@2 = 0 + 3 / 1 = 3
iDCG@2 = 3 + 2 / 1 = 5
nDCG@2 = 3/5 = 60%

1.3 Function dcg that computes DCG@k

1. def dcg(rels, k):
2.     sum = rels[0]
3.     for in in range(1, len(rels)): 
4.         sum += rels[i] / log2(i)
5.     return sum

1.3 Function ndcg that computes nDCG@k

1. def ndcg(rels, k):
2.    dcg = dcg(rels, k)
3.    sorted_rels = sort(rels, DESC)
4.    idcg = dcg(rels, k)
5.    return dcg / idcg

TASK 2 (Encodings)

2.1 Encoding for x = 1, ...,10

  1 = 100.
  2 = 101.0
  3 = 101.1
  4 = 110.00
  5 = 110.01
  6 = 110.10
  7 = 110.11
  8 = 111.000
  9 = 111.001
 10 = 111.010

2.2 Function for code for x < 16

1. def code(x):
2.     result = [1]
3.     length = floor(log2(x))
4.     result.extend([length // 2, length % 2])
5.     if x >= 2 and x < 4:  result.append(x % 2)
7.     if x >= 4 and x < 8:  result.extend([(x & 2) >> 1, x % 2])
8.     if x >= 8 and x < 16: result.extend([(x & 4) >> 2, (x & 2) >> 1, x % 2])
9.     return result

2.3 Formula for code length for arbitrary x

Length of Golomb part: floor(floor(log_2 x) / 4) + 3
Length of binary part: floor(log_2 x)
Sum of the two: floor(floor(log_2 x) / 4) + 3 + floor(log_2 x)

TASK 3 (Web applications and UTF-8)

3.1 Write HTML

<html>
<head><script src="convert.js"></script></head>
<body>Centimetres: <input id="cm"/>Inches: <input id="in"/></body>
</html>

3.2 Write JavaScript

$(document).ready(function(){
  $("#cm").keyup(function(){ $("#in").val($("#cm).val() / 2.54)); })
  $("#in").keyup(function(){ $("#cm").val($("#in).val() * 2.54)); })
})

3.3 UTF-8 code for Euro sign

32 in binary is 0010.0000, 172 in binary is 1010.1100 (128 + 32 + 8 + 4)
The code point in binary is hence: 0010.0000.1010.1100
We hence need a 3-byte code of the form 1110 xxxx 10 yyyyyy 10 zzzzzz (with a 16-bit code point xxxxyyyyyyzzzzzz)
The 3-byte UTF-8 code is hence: 1110 0010  10 000010  10 101100

3.4 Function for counting #characters in UTF-8 sequence

1. def count_utf8_chars(bytes):
2.    count = 0
3.    for byte in bytes:
4.        # Count all except the "follow bytes" of the form "10......". 
5.        if (byte & (128 + 64) != 128):
6.            count += 1
7.    return count

TASK 4 (Naive Bayes and k-means)

4.1 Steps of k-means

Points: P1 = (3, 1), P2 = (1, 2), P3 = (1, 4)
Initial centroids: C1 = (3, 0), C2 = (0, 3)
Step 1a: C1 <- P1, C2 <- P2, P3
Step 1b: C1 = (3, 1), C2 = (1, 3)
Step 2a: C1 <- P1, C2 <- P2, P3 
Step 2b: C1 = (3, 1), C2 = (1, 3)
Centroid identical to those from previous round -> STOP

4.2 Compute centroids from A and P

1. def centroids(A, P):
2.    return A.normalizecolumnsL1().transpose() * P

4.3 Determine w and b of Naive Bayes

Learned probabilities: p_1X = 3/4, p_2X = 1/4, p_1Y = 1/4, p_2Y = 3/4, p_X = 1/3, p_Y = 2/3
Linear classifier: w = (log_2 (p_1X/p_1Y), log_2 (p_2X/p_2Y)) = (log_2 3, -log_2 3)
Linear classifier: b = - log_2 (p_X/p_Y)

4.4 Example such that Naive Bayes decides 2x > y

Solution for x > y:

Choose P1 = (2, 1) --> X and P2 = (1, 2) --> Y
Then p_1X = 2/3, p_2X = 1/3, p_1Y = 1/3, p_2Y = 2/3
Then w = (1, -1) and b = 0
Then (w1, w2) * (x, y) - b > 0  <=>  x > y.

TASK 5 (Latent Semantic Indexing)

5.1 Show that V is row-orthonormal

Product without factor 1/4 * sqrt(2) is: [[8, 0], [0, 8]]
1/4 * sqrt(2) * 1/4 * sqrt(2) = 1/16 * 2 = 1/8
This gives indeed the identity matrix

5.2 Compute missing S

A * At = [[40, 24], [24, 64]]
A * At * [1, 1]t = [64, 64]t -> Eigenvalue 64
A * At * [1, -11]t = [16, -16]t -> Eigenvalue 16
The singular values are hence 8 and 4
S is hence diag(8, 4)

5.3 Rank of A and making it rank 3

A has two independent columns -> rank >= 2
A has two rows -> rank <= 2
Since the number of rows is an upper bound, adding another column cannot make the rank 3

5.4 Function for L2-normalization of a vector

1. def normalize(vector):
2.     sum = 0
3.     for x in vector:
4.         sum += x * x
5.     for x in vector:
6.         x = x / sqrt(sum)

TASK 6 (Miscellaneous)

6.1 Total number of items in a 3-gram index

Each 3-gram of each word contributed one item. A word v has |v| + 2 3-grams.
Hence the total number of items is sum_i |v_i| + 2 * n.

6.2 SQL query for persons who founded company in city they were born in

SELECT founded_by.person
FROM founded_by, based_in, born_in
WHERE founded_by.person = born_in.person
AND   founded_by.company = based_in.company
AND   based_in.city = born_in.city

6.3 Maximum entropy of distribution given by p1, ..., pn

Entropy is sum_i -p_i * log2 p_i
Optimize sum_i -p_i * ln p_i (the constant factor log / ln does not change the position of the optimum)
First derivative: -1 - ln p_i + lambda = 0  =>  all p_i equal and hence 1/n
Second derivative:  -1/p_i < 0  =>  MAX
Resulting entropy: sum_i -1/n * log2 (1/n) = log2 n
  • MoinMoin Powered
  • Python Powered
  • GPL licensed
  • Valid HTML 4.01