4805
Comment:
|
5895
|
Deletions are marked like this. | Additions are marked like this. |
Line 1: | Line 1: |
This page describes our reproducibility requirements for ''all'' projects and theses supervised by someone from our group. | #acl Claudius Korzen:read,write Patrick Brosi:read,write Niklas Schnelle:read,write Markus Näther:read,write All:read This page describes our reproducibility requirements for ''all'' projects and theses supervised by someone from our group. You should read it carefully when you ''begin'' your work. If you leave it until the end, it is much more work. If you consider it right from the beginning, it is actually quite rewarding. |
Line 7: | Line 9: |
Right after the very first meeting with your supervisor you will be assigned the following. If for some reason, this does not happen within a day, you should send an email to your supervisor ''and'' our system adminstrator [[https://ad.informatik.uni-freiburg.de/staff/dal-ri|Frank Dal-Ri]]. | Right after the very first meeting with your supervisor you will be assigned the following. If for some reason, this does not happen within a day, you should send an email to your supervisor ''and'' our system administrator [[https://ad.informatik.uni-freiburg.de/staff/dal-ri|Frank Dal-Ri]]. |
Line 14: | Line 16: |
2. The <b>name of one of our machines</b>, on which you can work. Authentication works via your Informatik Account (first seven letters of your family name + first leter of your given name). This is the username referred to in the next two items.</p> | 2. The <b>name of one of our machines</b>, on which you can work. Authentication works via your Informatik Account (first seven letters of your family name + first letter of your given name). This is the username referred to in the next two items.</p> |
Line 20: | Line 22: |
4. A <b>directory /nfs/students/<username></b> for large datasets on our network file system (NFS). Access to these files can be (and often is) significantly slower, because data packets are routed via the network. However, this directory will be kept after you have given your presentation and received your grade. It should contain a tidied up version of all your data that is worth preserving and was too large to be uploaded to our SVN (see item 1).</p> | 4. A <b>directory /nfs/students/<firstname-lastname></b> for large datasets on our network file system (NFS). Access to these files can be (and often is) significantly slower, because data packets are routed via the network. However, this directory will be kept after you have given your presentation and received your grade. It should contain a tidied up version of all your data that is worth preserving and was too large to be uploaded to our SVN (see item 1).</p> |
Line 22: | Line 24: |
Note: if you are exclusively working on your own machine (not the preferred option, but possible if you know what you are doing), than Items 2 and 3 are not relevant for you. |
|
Line 25: | Line 29: |
Your code should be properly documented, it should have a consistent style, and there should be unit tests for the non-trivial functions. For the common languages C++, Java, and Python you find examples in out [[https://daphne.informatik.uni-freiburg.de/CodingStandards/svn|Coding Standards]]. | Your code should be properly documented, it should have a consistent style, and there should be at least a simple unit test for each non-trivial function (this is easy and actually quite rewarding for properly designed functions). For the common languages C++, Java, and Python you find examples in our [[https://daphne.informatik.uni-freiburg.de/CodingStandards/svn|Coding Standards]]. |
Line 27: | Line 31: |
In your subfolder in our SVN, there should be a ''README.txt'' or ''README.md'', in which you clearly explain how you organized your files and what can be found where. If as part of your project or theses you generated valuable data (= data, which and only be recreated with large effort or not at all), you should put this data in the folder '''/nfs/students/<username>''' (see above) and mention this in the README as well. | In your subfolder in our SVN, there should be a ''README.txt'' or ''README.md'', in which you clearly explain how you organized your files and what can be found where. If as part of your project or thesis you generated valuable data (= data, which can only be recreated with large effort or not at all), you should put this data in the folder '''/nfs/students/<firstname-lastname>''' (see above) and mention this in the README as well. There should be a README file in your ''/nfs/students'' directory as well. |
Line 29: | Line 33: |
= Reproducibility = | = Reproducibility via Docker and Make = |
Line 31: | Line 35: |
Your work should be made easily reproducible using ''Docker'' as described in this [[http://ad-wiki.informatik.uni-freiburg.de/teaching/DockerExample|Docker example]]. In particular, there should be a file <b>Dockerfile</b> in the top-level directory of your SVN folder such we can reproduce your results as follows: | Your work should be made easily reproducible using ''Docker'' as described in this [[http://ad-wiki.informatik.uni-freiburg.de/teaching/DockerExample|Docker example]]. In particular, there should be a file '''Dockerfile''' in the top-level directory of your SVN folder such that we can reproduce your results as follows: |
Line 36: | Line 40: |
docker build -t <name> docker run -it --name <name> -v /nfs/students:/extern/data |
docker build -t <name> . docker run -it -v /nfs/students/<firstname>-<lastname>:/extern/data <name> |
Line 40: | Line 44: |
These commands should build and run a docker container, in which everything is properly prepared for your code to run seamlessly. The purpose of the -v option is that the data from ''/nfs/students'' is available in the contained via ''/extern/data''. Note you can make the contents of the SVN folder available in the docker container via a single COPY in the Dockerfile (that is, you don't have to check out the SVN folder again in the container). | These commands should build and run a docker container, in which everything is properly prepared for your code to run seamlessly. The purpose of the -v option is that the data from ''/nfs/students'' is available in the container via ''/extern/data''. Note that you can make the contents of the SVN folder available in the docker container via COPY in the Dockerfile (that is, you don't have to check out the SVN folder again in the container), see the example linked to above. |
Line 42: | Line 46: |
In the docker container, there should be a ''Makefile'' with targets of your choice, so that we can run your various experiments or pipelines or services or whatever it is that you have done. The proper choice of targets is up to you, but the first target in your Makefile should always be ''help'' so that just ''make'' will print some information on what can done with your Makefile. See the Docker example linked to above for an example of such a Makefile. | In the docker container, there should be a ''Makefile'' with targets of your choice, so that we can run your various experiments or pipelines or services or whatever it is that you have done. The proper choice of targets is up to you, but the first target in your Makefile should always be ''help'' so that just ''make'' will print some information on what can done with your Makefile. See the Docker example linked to above for a simple example of such a Makefile. |
Line 46: | Line 50: |
Of course, you want to test whether your Dockerfile works. However, you cannot run ''docker build'' or ''docker run'' on our machines, because that would be a security risk (with the right arguments, you could then become ''root'' on our machines). | Of course, you want to test whether your Dockerfile works. However, you cannot run ''docker build'' or ''docker run'' on our machines, because that would pose a security risk (with the right arguments, you could then become ''root'' on our machines). |
Line 48: | Line 52: |
As a remedy, we provide a ''wharfer'' command, which you can use just like ''docker'', but without the security risks above. The use of wharfer is [[https://github.com/ad-freiburg/wharfer|documented here]]. On our machines ''tapoa'' and ''metropolis'' it is already installed and you can just use it. | As a remedy, we provide a ''wharfer'' command, which you can use just like ''docker'', but without the mentioned security risks. The use of wharfer is [[https://github.com/ad-freiburg/wharfer#using-wharfer|documented here]]. On our machines ''tapoa'', ''atlantis'', ''fiji'', ''nkaba'' and ''metropolis'' it is already installed and you can just use it. If you have been assigned a different machine for your work, ask for access to one of these machines once your are ready to test your Dockerfile. |
Line 50: | Line 54: |
= Troubleshooting = | |
Line 51: | Line 56: |
See [[DockerTroubleshooting]] for how to deal with some typical problems which we encountered on our machines so far. |
This page describes our reproducibility requirements for all projects and theses supervised by someone from our group. You should read it carefully when you begin your work. If you leave it until the end, it is much more work. If you consider it right from the beginning, it is actually quite rewarding.
Contents
Access to our SVN, one of our machines, and our file system
Right after the very first meeting with your supervisor you will be assigned the following. If for some reason, this does not happen within a day, you should send an email to your supervisor and our system administrator Frank Dal-Ri.
1. A subfolder in our SVN with URL https://ad-svn.informatik.uni-freiburg.de/student-[projects|theses]/<firstname>-<lastname>. Authentication works via your RZ Account (initials + number).
2. The name of one of our machines, on which you can work. Authentication works via your Informatik Account (first seven letters of your family name + first letter of your given name). This is the username referred to in the next two items.
3. A directory /local/data/<username> for large datasets on a local disk of the machine which you have been assigned. The local disks are fast, so this is great for IO-heavy code (for example, a search engine which frequently reads large segments of data from disk). This directory will be deleted, once you have given your presentation and received your grade.
4. A directory /nfs/students/<firstname-lastname> for large datasets on our network file system (NFS). Access to these files can be (and often is) significantly slower, because data packets are routed via the network. However, this directory will be kept after you have given your presentation and received your grade. It should contain a tidied up version of all your data that is worth preserving and was too large to be uploaded to our SVN (see item 1).
Note: if you are exclusively working on your own machine (not the preferred option, but possible if you know what you are doing), than Items 2 and 3 are not relevant for you.
Coding and Data Standards
Your code should be properly documented, it should have a consistent style, and there should be at least a simple unit test for each non-trivial function (this is easy and actually quite rewarding for properly designed functions). For the common languages C++, Java, and Python you find examples in our Coding Standards.
In your subfolder in our SVN, there should be a README.txt or README.md, in which you clearly explain how you organized your files and what can be found where. If as part of your project or thesis you generated valuable data (= data, which can only be recreated with large effort or not at all), you should put this data in the folder /nfs/students/<firstname-lastname> (see above) and mention this in the README as well. There should be a README file in your /nfs/students directory as well.
Reproducibility via Docker and Make
Your work should be made easily reproducible using Docker as described in this Docker example. In particular, there should be a file Dockerfile in the top-level directory of your SVN folder such that we can reproduce your results as follows:
svn co https://ad-svn.informatik.uni-freiburg.de/student-[projects|theses]/<firstname>-<lastname> cd <firstname>-<lastname> docker build -t <name> . docker run -it -v /nfs/students/<firstname>-<lastname>:/extern/data <name>
These commands should build and run a docker container, in which everything is properly prepared for your code to run seamlessly. The purpose of the -v option is that the data from /nfs/students is available in the container via /extern/data. Note that you can make the contents of the SVN folder available in the docker container via COPY in the Dockerfile (that is, you don't have to check out the SVN folder again in the container), see the example linked to above.
In the docker container, there should be a Makefile with targets of your choice, so that we can run your various experiments or pipelines or services or whatever it is that you have done. The proper choice of targets is up to you, but the first target in your Makefile should always be help so that just make will print some information on what can done with your Makefile. See the Docker example linked to above for a simple example of such a Makefile.
Testing your Dockerfile
Of course, you want to test whether your Dockerfile works. However, you cannot run docker build or docker run on our machines, because that would pose a security risk (with the right arguments, you could then become root on our machines).
As a remedy, we provide a wharfer command, which you can use just like docker, but without the mentioned security risks. The use of wharfer is documented here. On our machines tapoa, atlantis, fiji, nkaba and metropolis it is already installed and you can just use it. If you have been assigned a different machine for your work, ask for access to one of these machines once your are ready to test your Dockerfile.
Troubleshooting
See DockerTroubleshooting for how to deal with some typical problems which we encountered on our machines so far.