⇤ ← Revision 1 as of 2009-10-20 03:10:38
Size: 843
Comment:
|
Size: 895
Comment:
|
Deletions are marked like this. | Additions are marked like this. |
Line 3: | Line 3: |
=== Part 1: Introduction === | === Part 1: Introduction (30 minutes) === |
Line 12: | Line 12: |
=== Part 2: Parsing / Tokenization === | === Part 2: Parsing / Tokenization (10 minutes) === |
Line 16: | Line 16: |
=== Part 3: Inverted Index === | === Part 3: Inverted Index (30 minutes) === |
Line 23: | Line 23: |
=== Part 4: Exercises === | === Part 4: Exercises (20 minutes) === |
Lecture 1, Thursday October 22, 2009
Part 1: Introduction (30 minutes)
Deutsch or English. Short introduction of myself. Demo of CompleteSearch, explain components. Difference between web search and search in homogenous collections. Comment on style of this course: exercises, Wiki, etc. Talk about block project at the end.
Part 2: Parsing / Tokenization (10 minutes)
Give examples, where this is not trivial. Chinese. UTF8. Compund words. Stemming.
Part 3: Inverted Index (30 minutes)
Why Indexing. Grep. Inverted Index. Building it. Querying it. List intersection. Quick analysis of index construction time and space and of query time.
Part 4: Exercises (20 minutes)
Go through exercises one by one. Explain about Wiki. Will be used throughout semester. For uploading exercises. For asking questions. For collaboration.