AD Teaching Wiki:

TASK 1 (Compression + entropy)

1, x, x+1, 2x, 2x+1, 3x, 3x+1, 4x

1.1 For x = 10, gap-encode using Golomb with M = 8

Gaps are: +1, +9, +1, +9, +1, +9, +1, +9 (eight gaps, not seven)
Golomb code for +1 : 1001
Golomb code for +9 : 01001
Encoding of the whole sequence: 1001 01001 1001 01001 1001 01001 1001 01001

1.2 No larger M gives shorter sequence

For M > 8, writing the residual (r) part alone costs >= 4 Bits. Together with the 1 separating it from the q part, this is >= 5 Bits per code. That is >= 40 Bits for the sequence above. With M = 8 we need only 36 Bits.

1.3 Function that computes Golomb code for M = 8

ArrayList<Bool> golombCode(int x) {
  result = new ArrayList<Bool>; 
  int q = x / 8;
  int r = x % 8;
  for (int i = 0; i < q; i++) { result.add(0); }
  result.add(1);
  result.add(r & 4 > 0);
  result.add(r & 2 > 0);
  result.add(r & 1 > 0);
  return result;

1.4 Entropy-optimal code for sequence above, for abitrary x

We have two symbols, each occuring the same number of times. The entropy of the corresponding random variable (X taking on either value with probability 1/2) is 1/2 * log_2 1/2 + 1/2 * log_2 1/2 = 1. And there is indeed a (prefix-free) encoding with just 1 Bit per value:

1 : 0
x-1 : 1

TASK 2 (Prefix edit distance + 2-grams)

x = angela, y = angelina

2.1 Compute the PED

PED(angela, angelina) = ED(angela, angeli) = 1
Justification: If PED = 0 than x is a perfect prefix of y, which is not the case.

PED(angelina, angela) = ED(angelina, angela) = 2
Justification: for shorter prefixes of angela, the difference of the string length would be >= 3, and so certainly ED >= 3.

2.2 Compute comm(x, y)

2-grams of x = angela:  an, ng, ge, el, la
2-grams of y = angelina:  an, ng, ge, el, li, in, na
comm(x, y) = 4

2.3 If PED(x, y) = 1, then comm(x, y) >= |x| - 3

The number of 2-grams of x is |x| - 1.
If PED(x, y) = 1, then we have y' prefix of y with ED(x, y') = 1.
One operation can affect at most 2 2-grams, so comm(x, y') >= |x| - 3.
Hence also comm(x, y) >= |x| - 3.

2.4 Example for PED(x, y) = 1, |x| = 3, comm(x, y) = 0

x = aba
y = aca

AD Teaching Wiki: InformationRetrievalWS1314/ExamSolutions (last edited 2014-02-19 14:51:13 by Hannah Bast)