Differences between revisions 1 and 3 (spanning 2 versions)
Revision 1 as of 2007-08-26 20:58:35
Size: 623
Editor: p54A5C62A
Comment:
Revision 3 as of 2007-08-26 21:40:43
Size: 1875
Editor: p54A5C62A
Comment:
Deletions are marked like this. Additions are marked like this.
Line 1: Line 1:
== Topic Description for a Master's Thesis == = Topic Description for a Master's Thesis =
Line 3: Line 3:
Note: the following could very well lead to a publication, too. In fact, it would make a very nice follow-up to the SIGIR'07 paper by Turpin et al, cited below Note: the following could very well lead to a publication, too. In fact, it would make a very nice follow-up to the SIGIR'07 paper by Turpin et al., cited below.

== Synopsis ==

A new method for snippet generation that improves over the method from the SIGIR'07 paper by Turpin et. al in three ways: (i) no more search in the document text is required, but all that information is computed already during the query processing; (ii) also work for advanced search features, where the words to be highlighted to not appear verbatim in the text, e.g., substring search, synonym search, semantic search; (iii) the semantics of all the search operators (e.g., proximity, or, join) do not have to be re-implemented for the excerpt generator, but only once for the query processor.

Possible titles: ''Efficient Excerpt Generation for Complex Queries'', or ''Efficient Excerpt Generation for Advanced Search''.

== The method by Turpin et al ==

Stores each document in a compressed format, where each word is replaced by an id, and that id is encoded in way such that the more frequent words get smaller ids.

Given a query, transforms all words in the query to their id, and then find all matches of these ids in the document, compressed as described above.

Note: as described this only works for literal matches; it is not clear how to make it work for e.g. substring matches.
Line 8: Line 22:
== Literature ==
Line 9: Line 24:
[ Fast Generation of Result Snippets in Web Search] [[BR]]
Andrew Turpin and Yohannes Tsegay and David Hawking and Hugh Williams [[BR]]
Proceedings 30th Conference on Research and Development in Information Retrieval (SIGIR'07), pages 127 - 134. [[BR]]

[attachment:CompleteSearch/ExcerptGenerator/turpinetal07sigir.pdf PDF][attachment:CompleteSearch/ExcerptGenerator/turpinetal07sigir.ppt Slides]
'''Fast Generation of Result Snippets in Web Search'''
[[attachment:CompleteSearch/ExcerptGenerator/turpinetal07sigir.pdf PDF]]
[[attachment:CompleteSearch/ExcerptGenerator/turpinetal07sigir.ppt Slides]]
[[BR]]
Andrew Turpin and Yohannes Tsegay and David Hawking and Hugh Williams[[BR]]
in Proceedings 30th Conference on Research and Development in Information Retrieval (SIGIR'07), pages 127 - 134.[[BR]]

Topic Description for a Master's Thesis

Note: the following could very well lead to a publication, too. In fact, it would make a very nice follow-up to the SIGIR'07 paper by Turpin et al., cited below.

Synopsis

A new method for snippet generation that improves over the method from the SIGIR'07 paper by Turpin et. al in three ways: (i) no more search in the document text is required, but all that information is computed already during the query processing; (ii) also work for advanced search features, where the words to be highlighted to not appear verbatim in the text, e.g., substring search, synonym search, semantic search; (iii) the semantics of all the search operators (e.g., proximity, or, join) do not have to be re-implemented for the excerpt generator, but only once for the query processor.

Possible titles: Efficient Excerpt Generation for Complex Queries, or Efficient Excerpt Generation for Advanced Search.

The method by Turpin et al

Stores each document in a compressed format, where each word is replaced by an id, and that id is encoded in way such that the more frequent words get smaller ids.

Given a query, transforms all words in the query to their id, and then find all matches of these ids in the document, compressed as described above.

Note: as described this only works for literal matches; it is not clear how to make it work for e.g. substring matches.

Literature

Fast Generation of Result Snippets in Web Search CompleteSearch/ExcerptGenerator/turpinetal07sigir.pdf PDF CompleteSearch/ExcerptGenerator/turpinetal07sigir.ppt SlidesBR Andrew Turpin and Yohannes Tsegay and David Hawking and Hugh WilliamsBR in Proceedings 30th Conference on Research and Development in Information Retrieval (SIGIR'07), pages 127 - 134.BR

CompleteSearch: completesearch/ExcerptGenerator/ThesisTopic (last edited 2007-12-12 15:30:05 by infno1613)