Differences between revisions 3 and 4
Revision 3 as of 2007-08-22 10:13:13
Size: 1464
Editor: mpiat1403
Comment: Added 2 main sections
Revision 4 as of 2007-10-23 15:17:16
Size: 1544
Editor: infno1613
Comment:
Deletions are marked like this. Additions are marked like this.
Line 1: Line 1:
## page was renamed from CompleteSearch/ExcerptGenerator/FindingVitalIntervals

One essential part of the excerpt generation, which can be completely abstracted from the actual task, is finding sentences that contain at least one query term. This part is described here as an algorithm of its own.

Requirements

IN:

  • A list S (= Sentences' starting words) of integers in increasing order: [s0, s1, ..., sn]

  • m Lists Wi (= Word positions) of integers in increasing order: [w00, w01, ..., w00n], ...

  • The position MAXPOS (>sn) must be input to yield the first position after the last word of the document.

S describes the starting position of the sentences in a document, that is, the positions of all first words of all sentences. Thus, S divides the documents into intervals.

Example:

S = [1,6,13,19,33,38,56,77]

Wi describes the positions in the document where the i-th query word occurs.

Example:

W0 = [2,4,45,46,80]

W1 = [3,19,22,34]

A document interval (as induced by S) is called vital if it contains at least one of the query words of one Wi.

OUT:

A list of all vital intervals, in increasing order. For each vital interval its left and right boundary and the positions of the query words it contains; also these values must be listed in increasing order.

Example (computed from the values above):

V = ( (1,2,3,4,6), (19,19,22,33), (33,34,38), (38,45,46,56), (77,80,MAPOS) )

Implementation

TODO

CompleteSearch: completesearch/ExcerptGenerator/FindingVitalIntervalsDELETE (last edited 2007-10-23 15:18:16 by infno1613)