AD Teaching Wiki:

This page describes our reproducibility requirements for all projects and theses supervised by someone from our group. You should read it carefully when you begin your work. If you leave it until the end, it it much more work. If you consider it right from the beginning, it is actually quite rewarding.

Access to our SVN, one of our machines, and our file system

Right after the very first meeting with your supervisor you will be assigned the following. If for some reason, this does not happen within a day, you should send an email to your supervisor and our system adminstrator Frank Dal-Ri.

1. A subfolder in our SVN with URL https://ad-svn.informatik.uni-freiburg.de/student-[projects|theses]/<firstname>-<lastname>. Authentication works via your RZ Account (initials + number).

2. The name of one of our machines, on which you can work. Authentication works via your Informatik Account (first seven letters of your family name + first leter of your given name). This is the username referred to in the next two items.

3. A directory /local/data/<username> for large datasets on a local disk of the machine which you have been assigned. The local disks are fast, so this is great for IO-heavy code (for example, a search engine which frequently reads large segments of data from disk). This directory will be deleted, once you have given your presentation and received your grade.

4. A directory /nfs/students/<username> for large datasets on our network file system (NFS). Access to these files can be (and often is) significantly slower, because data packets are routed via the network. However, this directory will be kept after you have given your presentation and received your grade. It should contain a tidied up version of all your data that is worth preserving and was too large to be uploaded to our SVN (see item 1).

Coding and Data Standards

Your code should be properly documented, it should have a consistent style, and there should be unit tests for the non-trivial functions. For the common languages C++, Java, and Python you find examples in our Coding Standards.

In your subfolder in our SVN, there should be a README.txt or README.md, in which you clearly explain how you organized your files and what can be found where. If as part of your project or thesis you generated valuable data (= data, which can only be recreated with large effort or not at all), you should put this data in the folder /nfs/students/<username> (see above) and mention this in the README as well. There should be a README file in your /nfs/students directory as well.

Reproducibility via Docker and Make

Your work should be made easily reproducible using Docker as described in this Docker example. In particular, there should be a file <b>Dockerfile</b> in the top-level directory of your SVN folder such that we can reproduce your results as follows:

svn co https://ad-svn.informatik.uni-freiburg.de/student-[projects|theses]/<firstname>-<lastname>
cd <firstname>-<lastname>
docker build -t <name> .
docker run -it --name <name> -v /nfs/students:/extern/data

These commands should build and run a docker container, in which everything is properly prepared for your code to run seamlessly. The purpose of the -v option is that the data from /nfs/students is available in the container via /extern/data. Note that you can make the contents of the SVN folder available in the docker container via COPY in the Dockerfile (that is, you don't have to check out the SVN folder again in the container).

In the docker container, there should be a Makefile with targets of your choice, so that we can run your various experiments or pipelines or services or whatever it is that you have done. The proper choice of targets is up to you, but the first target in your Makefile should always be help so that just make will print some information on what can done with your Makefile. See the Docker example linked to above for a simple example of such a Makefile.

Testing your Dockerfile

Of course, you want to test whether your Dockerfile works. However, you cannot run docker build or docker run on our machines, because that would pose a security risk (with the right arguments, you could then become root on our machines).

As a remedy, we provide a wharfer command, which you can use just like docker, but without the mentioned security risks. The use of wharfer is documented here. On our machines tapoa and metropolis it is already installed and you can just use it. If you have been assigned a different machine for your work, ask for access to one of these machines once your are ready to test your Dockerfile.

AD Teaching Wiki: Reproducibility (last edited 2018-05-16 05:04:06 by Hannah Bast)