AD Teaching Wiki:

Exercise Sheet 1

Instructions:

1. (only necessary once, before you upload something for the first time) Assume your name is Donald Duck. (0) If you haven't already done so, create a Wiki account with your name DonaldDuck (click on "Login" on the top left, then click on "you can create one now"). Always be logged in when you are about to change anything on the Wiki. (1) Type the following URL in your browser: http://ad-wiki.informatik.uni-freiburg.de/teaching/SearchEnginesWS0910/DonaldDuckExercises. (2) Click on "create new empty page" and save the empty page. (3) We will then add asap the following line to your page: #acl DonaldDuck:read,write -All:read. This will ensure that only yourself and the organizers of the course can see your solutions to the exercises, the number of points you got, etc.

2. (assuming you already have created your page http://ad-wiki.informatik.uni-freiburg.de/teaching/SearchEnginesWS0910/DonaldDuckExercises as described above) (1) Recall that your name is not Donald Duck. (2) Go to your page DonaldDuckExercises. (2) Upload your solutions there as PDF (no other formats allowed), giving your file the name donald_duck_ex1.pdf. (3) Upload your code separately as ZIP or GZIPPED TAR archive, giving your file the name donald_duck_ex1.zip or donald_duck_ex1.tgz. (4) Put the corresponding links in the table below, as well as the other information requested. Follow the pattern of the lines already there.

PLEASE UPLOAD SOLUTIONS (PDF) AND CODE (ZIP OR TGZ) SEPARATELY !

Name

Link to uploaded solution

Link to uploaded code

Name of collection

#Docs in collection

Zipf epsilon

Johannes Stork

PDF

TARGZ

RFCs and german news websites and www.textfiles.com

5540 and 5415 and 48799

0.1052 and 0.0762 and 0.0762

Christian Simon

PDF

ZIP

selected archives from www.textfiles.com

2865

0.052

Matthias Sauer

Included in Code zip

ZIP

non-selected archives from www.textfiles.com

4328

0.788

Zhongjie Cai

PDF

ZIP

RFC Documents and Text Stories

5549 and 1255

0.6364 and 0.5137

Waldemar Wittmann

PDF

TARGZ

RFC Documents

1459

0.08396

Florian Bäurle

PDF

ZIP

RFCs and selected files from www.textfiles.com

44618

0.08243

Marius Greitschus

PDF

.tar.gz

GNU Man-Pages

5051

0.098

Markus Gruetzner

PDF

ZIP

RFC

~5500

0.01299

Thomas Liebetraut

PDF

tgz

IRC logs

~3800

0.122

Claudius Korzen

PDF

ZIP

RFC's

1460

0.031

Daniel Schauenberg

PDF

tar.gz

Excerpt from RFCs

2000

0.0164

Alexander Gutjahr

PDF

tar.gz

RFCs 1- 2000

ca. 2000

0.06095

Björn Buchhold

PDF

ZIP

some RFCs

3100

0.017163

Ivo M.

PDF

tar

RFCs

5520

0.94

Mirko Brodesser

PDF

ZIP

some humor/fun files from textfiles.com

~1000

0.1

Triatmoko

PDF

RAR

Archives from www.textfiles.com

ca 2000

0.016

AlexanderNutz

PDF

ZIP

html-dateien von fünf-filmfreunde.de (blog über filme..)

~ 5000

~ 0.022

Jonas Krisch

PDF

ZIP

textfiles

~1500

0.154

Andre Borgeat

PDF

ZIP

Reuters-21578

~20000

Jonas Koenemann

PDF

ZIP

wegt from different pages

~1500

Paresh Paradkar

PDF

ZIP

Selective archives from www.ibibo.org

~1600

0.05966

AlexanderSchneider

PDF

ZIP

selected archives from http://textfiles.com/

~ 1000

~ 0.023

JensSilvaSantisteban

PDF

ZIP

RFCs and some other files from the web

~ 1400

~ 0.084

Daniel Frey

PDF

ZIP

archives from http://textfiles.com/

~ 50000

n.a.

JohannBetz

PDF

ZIP

All textual RFCs

5536

n.a.

Matthias Frorath

PDF

ZIP

Some files from textfiles.com

~ 1300

Ivo Chichkov

PDF

ZIP

text converted HTML files - eNews

~1500

0.154

Manuela Ortlieb

PDF

ZIP

text converted different eBooks

2288

0.001

Jonas Sternisko

PDF

.tgz

text mined with wget from different sources

27k+

0.223

Eric Lacher

PDF

ZIP

RFCs

about 6000

0.912232

JohannLatocha

n.a.

ZIP

RFCs

1000

n.a.

Waleed Butt

PDF

ZIP

RFCs & textfiles

1500

0.04787

AD Teaching Wiki: SearchEnginesWS0910/ExerciseSheet1 (last edited 2009-10-27 12:54:29 by dip-255)