Note: This page is superseded by ??? and can be deleted

One essential part of the excerpt generation, which can be completely abstracted from the actual task, is finding sentences that contain at least one query term. This part is described here as an algorithm of its own.



S describes the starting position of the sentences in a document, that is, the positions of all first words of all sentences. Thus, S divides the documents into intervals.


S = [1,6,13,19,33,38,56,77]

Wi describes the positions in the document where the i-th query word occurs.


W0 = [2,4,45,46,80]

W1 = [3,19,22,34]

A document interval (as induced by S) is called vital if it contains at least one of the query words of one Wi.


A list of all vital intervals, in increasing order. For each vital interval its left and right boundary and the positions of the query words it contains; also these values must be listed in increasing order.

Example (computed from the values above):

V = ( (1,2,3,4,6), (19,19,22,33), (33,34,38), (38,45,46,56), (77,80,MAPOS) )



CompleteSearch: completesearch/ExcerptGenerator/FindingVitalIntervalsDELETE (last edited 2007-10-23 15:18:16 by infno1613)