AD Research Wiki:

GPU Cluster

General Information

Cluster access

To get access to the GPU cluster, send an email to Frank (dal-ri@informatik.uni-freiburg.de), with your supervisor and Matthias (hertelm@informatik.uni-freiburg.de) in the Cc.

Getting help

The Cluster's FAQ answers most frequent questions. To be able to read the FAQ, you must register to the Support Ticket System first. Use your informatik.uni-freiburg.de e-mail address. Registration page: https://osticket.informatik.uni-freiburg.de/account.php?do=create

Link to the Cluster's FAQ page: https://osticket.informatik.uni-freiburg.de/kb/faq.php?cid=1

In addition, there is the old help page by the Machine Learning chair. You can access the help page once you have access to the GPU cluster. Log in with your TF-account. Link to the help page: https://aadwiki.informatik.uni-freiburg.de/Meta_Slurm

If you have problems accessing the cluster, send an email to Frank, with your supervisor and Matthias in the Cc. For any other issues, ask your supervisor and Matthias.

Logging in to the cluster

University network

To be able to log in to the cluster, you must be in the university’s network.

We recommend using the university’s VPN. See the various information pages:

Alternatively, you can access the university's network by logging in to login.informatik.uni-freiburg.de via SSH:

ssh <user>@login.informatik.uni-freiburg.de

Cluster login

There are three login nodes:

Log in via SSH:

ssh <user>@kislogin1.rz.ki.privat

Status information

Use the command sfree to get status information for the cluster partitions. It shows all accessible partitions and the number of used and total GPUs per partition.

You can also watch the partitions and your jobs in the dashboard: https://kislurm-dashboard.informatik.uni-freiburg.de/d/spTRj8IMz/kislurm2?orgId=1

Workspaces

You automatically have access to your home directory on the cluster. Since the size of your home directory is limited to a few GB, we recommend using a workspace for your project. A workspace is a directory that can be accessed from all nodes of the cluster.

Creating a workspace

Use the command ws_allocate to create a new workspace. For help, type man ws_allocate.

Example:

ws_allocate -r 10 -m <user>@informatik.uni-freiburg.de test-workspace 30

This command creates a workspace <user>-test-workspace, which expires in 30 days. 10 days before the expiration, a notification is sent to the specified email address.

Find your workspace

To list the paths of your workspaces, type ws_list.

Extending a workspace

When a workspace expires, all content is lost. To extend the workspace, use the command:

ws_allocate -x <ID> <DAYS>

Running jobs

Interactive session

Use the following command to start an interactive session with 1 GPU (per default):

srun -p alldlc_gpu-rtx2080 --pty bash

To check if you have access to the GPU, run:

python3 -c "import torch; print(torch.cuda.is_available())"

The result should be True.

Submitting jobs

Write a bash file containing all instructions for your job, and then run:

sbatch -p alldlc_gpu-rtx2080 <bash_file>

The output of your job will be written to a file slurm-<jobid>.out in your current directory.

To get the status of your jobs, run:

sacct --user=$USER

To list all your running jobs, run:

squeue --user=$USER

Accessing GitHub

To access GitHub via SSH, add the following lines to the file ~/.ssh/config

Host github.com
  ProxyCommand ssh -q login.informatik.uni-freiburg.de nc %h %p

AD Research Wiki: HowTos/GpuCluster (last edited 2021-07-16 11:12:26 by Matthias Hertel)