1511
Comment:
|
4015
|
Deletions are marked like this. | Additions are marked like this. |
Line 1: | Line 1: |
#acl Björn Buchhold:read,write Elmar Haussmann:read,write Claudius Korzen:read,write Florian Bäurle:read,write Jonas Sternisko:read,write All:read | #acl Björn Buchhold:read,write Elmar Haussmann:read,write Claudius Korzen:read,write Sabine Storandt:read,write |
Line 16: | Line 16: |
Documents ordered by these similarities: D5, D3, D2 and D4, D1 (D2 and D4 have the same score) | Documents ordered by these similarities: D5, D3, D2 and D4, D1 (D2 and D4 have the same score) |
Line 18: | Line 18: |
Precisions: P@1 = 100%, P@5 = 40% | Precisions: P@1 = 100%, P@5 = 40% (the others are not needed to compute the AP) |
Line 42: | Line 42: |
=== 1.4 | === 1.4 Write A as product of 3 x 2 and 2 x 5 ... what does this say about rank(A) === |
Line 51: | Line 51: |
== TASK 2 (Encodings) == c(1) = 1, c(2) = 01, c(3) = 001, c(4) = 0001, ... === 2.1 Decode 110000100111001 + why prefix-free === Integer sequence encoded: 1, 1, 5, 3, 1, 1, 3 All valid codes end in 1. Any non-empty prefix of any of the codes ends with a zero. Therefore, no prefix of any of the code is a valid code. === 2.2 Function decode === {{{ def decode(bits): x = 0 result = [] for bit in bits: x += 1 if bit == 1: result.add(x) x = 0 return result if x == 0 else [] }}} === 2.3 Random code, 0 or 1 with probability 1/2 each until 1 === The code for x has exactly x bits. The probability of generating exactly those x bits with the process described is 1/2 * ... * 1/2 (x times), that is, 2 to the -x. === 2.4 Entropy-optimality for this probability distribution === Let L_i be the length of the code for integer i. Then L_i = i. The probability p_i of generating that code is 2^-i (according to Task 2.3). For entropy-optimality, we need L_i <= log_2 (1/p_i) + 1 = i + 1. This is true. == TASK 3 () == {{{ $(document).ready(function() { $("#query").keyup(function() { var query = $("#query").val(); $.get("http://www.gugel.de:8888/?q="+ query, function(result) { $("#result").html(result); }) }) }) }}} === 3.1 Simple HTML that makes sense for this JavaScript (without head part) === {{{ <html> <body> <input id="#query"/> <div id="#result"></div> </body> </html> }}} === 3.2 First fifteen characters of GET requests when typing "hey" + to which machine === {{{ GET /?q=h HTTP/ GET /?q=he HTTP GET /?q=hey HTT }}} The requests are sent to the machine with the address www.gugel.de === 3.3 Number of valid UTF-8 sequences of length 2 === Valid UTF-8 sequences of length 2 look like this {{{ 110xxxxx 10yyyyyy }}} where the 11-bit codepoint xxxxxyyyyyy encodes an integer that must be > 127 (otherwise a one-byte UTF-8 must be used). The number of such codepoints is 2048 - 128 = 1920. === 3.4 UTF-8 code and URL-escaped code of ü === {{{ ISO-8859-1 code in binary: 11111100 With padding, so that 11 bits: 00011111100 As UTF-8 code: 11000011 10111100 (Byte 1: 110 + first five bits of codepoint, Byte 2: 10 + last five bits) In decimal: 195 188 In hexadecimal: C3 BC URL-escaped: %C3%BC }}} |
TASK 1 (Ranking and evaluation)
bla 1 1 2 2 3 bli 1 0 1 2 2 blu 0 1 1 0 1
1.1 AP for query "bla bli"
The vector for "bla blu" is simply (1, 0, 1) Dot product similarities for D1, ..., D5: 1, 2, 3, 2, 4 Documents ordered by these similarities: D5, D3, D2 and D4, D1 (D2 and D4 have the same score) Bit array of relevances: 1, 0, 0, 0, 1 Precisions: P@1 = 100%, P@5 = 40% (the others are not needed to compute the AP) Average precision (AP): 70%
1.2 Function compute_ap
def compute_ap(relevances): prec_sum = 0 num_rel = 0 for i in range(len(relevances)): if relevances[i] == 1: num_rel += 1 prec = num_rel / (i + 1) prec_sum += prec return prec_sum / num_rel
1.3 Prove that AP = 100% if and only if all 1s at the front
If all 1s at the front, then each prec in the code above is 100%, and so will be the average.
If AP = 100%, then the last prec must be 100%, which can only happen if there is no 0 before the last 1.
1.4 Write A as product of 3 x 2 and 2 x 5 ... what does this say about rank(A)
1 1 2 2 3 1 1 1 0 1 2 2 = 1 0 * 1 0 1 2 2 0 1 1 0 1 0 1 0 1 1 0 1
This shows that each column of A can be written as a linear combination of the two columns of the 3 x 2 matrix on the right hand side. This proves that rank(A) <= 2.
TASK 2 (Encodings)
c(1) = 1, c(2) = 01, c(3) = 001, c(4) = 0001, ...
2.1 Decode 110000100111001 + why prefix-free
Integer sequence encoded: 1, 1, 5, 3, 1, 1, 3
All valid codes end in 1. Any non-empty prefix of any of the codes ends with a zero. Therefore, no prefix of any of the code is a valid code.
2.2 Function decode
def decode(bits): x = 0 result = [] for bit in bits: x += 1 if bit == 1: result.add(x) x = 0 return result if x == 0 else []
2.3 Random code, 0 or 1 with probability 1/2 each until 1
The code for x has exactly x bits. The probability of generating exactly those x bits with the process described is 1/2 * ... * 1/2 (x times), that is, 2 to the -x.
2.4 Entropy-optimality for this probability distribution
Let L_i be the length of the code for integer i. Then L_i = i.
The probability p_i of generating that code is 2^-i (according to Task 2.3).
For entropy-optimality, we need L_i <= log_2 (1/p_i) + 1 = i + 1. This is true.
TASK 3 ()
$(document).ready(function() { $("#query").keyup(function() { var query = $("#query").val(); $.get("http://www.gugel.de:8888/?q="+ query, function(result) { $("#result").html(result); }) }) })
3.1 Simple HTML that makes sense for this JavaScript (without head part)
<html> <body> <input id="#query"/> <div id="#result"></div> </body> </html>
3.2 First fifteen characters of GET requests when typing "hey" + to which machine
GET /?q=h HTTP/ GET /?q=he HTTP GET /?q=hey HTT
The requests are sent to the machine with the address www.gugel.de
3.3 Number of valid UTF-8 sequences of length 2
Valid UTF-8 sequences of length 2 look like this
110xxxxx 10yyyyyy
where the 11-bit codepoint xxxxxyyyyyy encodes an integer that must be > 127 (otherwise a one-byte UTF-8 must be used).
The number of such codepoints is 2048 - 128 = 1920.
3.4 UTF-8 code and URL-escaped code of ü
ISO-8859-1 code in binary: 11111100 With padding, so that 11 bits: 00011111100 As UTF-8 code: 11000011 10111100 (Byte 1: 110 + first five bits of codepoint, Byte 2: 10 + last five bits) In decimal: 195 188 In hexadecimal: C3 BC URL-escaped: %C3%BC