AD Teaching Wiki
  • Comments
  • Immutable Page
  • Menu
    • Navigation
    • RecentChanges
    • FindPage
    • Local Site Map
    • Help
    • HelpContents
    • HelpOnMoinWikiSyntax
    • Display
    • Attachments
    • Info
    • Raw Text
    • Print View
    • Edit
    • Load
    • Save
  • Login

FrontPage

Upload page content

You can upload content for the page named below. If you change the page name, you can also upload content for another page. If the page name is empty, we derive the page name from the file name.

File to load page content from
Page name
Comment

Revision 2 as of 2014-02-19 14:51:13
AD Teaching Wiki:
  • InformationRetrievalWS1314
  • ExamSolutions

TASK 1 (Compression + entropy)

1, x, x+1, 2x, 2x+1, 3x, 3x+1, 4x

1.1 For x = 10, gap-encode using Golomb with M = 8

Gaps are: +1, +9, +1, +9, +1, +9, +1, +9 (eight gaps, not seven)
Golomb code for +1 : 1001
Golomb code for +9 : 01001
Encoding of the whole sequence: 1001 01001 1001 01001 1001 01001 1001 01001

1.2 No larger M gives shorter sequence

For M > 8, writing the residual (r) part alone costs >= 4 Bits. Together with the 1 separating it from the q part, this is >= 5 Bits per code. That is >= 40 Bits for the sequence above. With M = 8 we need only 36 Bits.

1.3 Function that computes Golomb code for M = 8

ArrayList<Bool> golombCode(int x) {
  result = new ArrayList<Bool>; 
  int q = x / 8;
  int r = x % 8;
  for (int i = 0; i < q; i++) { result.add(0); }
  result.add(1);
  result.add(r & 4 > 0);
  result.add(r & 2 > 0);
  result.add(r & 1 > 0);
  return result;

1.4 Entropy-optimal code for sequence above, for abitrary x

We have two symbols, each occuring the same number of times. The entropy of the corresponding random variable (X taking on either value with probability 1/2) is 1/2 * log_2 1/2 + 1/2 * log_2 1/2 = 1. And there is indeed a (prefix-free) encoding with just 1 Bit per value:

1 : 0
x-1 : 1

TASK 2 (Prefix edit distance + 2-grams)

x = angela, y = angelina

2.1 Compute the PED

PED(angela, angelina) = ED(angela, angeli) = 1
Justification: If PED = 0 than x is a perfect prefix of y, which is not the case.

PED(angelina, angela) = ED(angelina, angela) = 2
Justification: for shorter prefixes of angela, the difference of the string length would be >= 3, and so certainly ED >= 3.

2.2 Compute comm(x, y)

2-grams of x = angela:  an, ng, ge, el, la
2-grams of y = angelina:  an, ng, ge, el, li, in, na
comm(x, y) = 4

2.3 If PED(x, y) = 1, then comm(x, y) >= |x| - 3

The number of 2-grams of x is |x| - 1.
If PED(x, y) = 1, then we have y' prefix of y with ED(x, y') = 1.
One operation can affect at most 2 2-grams, so comm(x, y') >= |x| - 3.
Hence also comm(x, y) >= |x| - 3.

2.4 Example for PED(x, y) = 1, |x| = 3, comm(x, y) = 0

x = aba
y = aca
  • MoinMoin Powered
  • Python Powered
  • GPL licensed
  • Valid HTML 4.01