Notes on the Research Computing Clusters

My SSH key

I won’t mind it if you add my SSH public keys in your ~/.ssh/authorized_keys. But please email me about it with the configurations of the machine you are granting me access to! My SSH public keys on the two laptops are:

ssh-ed25519 AAAAC3NzaC1lZDI1NTE5AAAAIJOMsQ840kn+OczmLlwt8Xua6xvtkND+3zEPZFg6xx/C jvhs0706@outlook.com
ssh-ed25519 AAAAC3NzaC1lZDI1NTE5AAAAIHuF5zsxISgZmOSsXENwhSwHuGcxQuPxYi2SrrKkR5R3 h299sun@uwaterloo.ca

And my SSH public keys on the clusters are:

ssh-ed25519 AAAAC3NzaC1lZDI1NTE5AAAAIGetGlRnDw0PK9jxMRyIH/aEFQFCwyeNez5fS6P7Sib9 h299sun@chippie
ssh-ed25519 AAAAC3NzaC1lZDI1NTE5AAAAIDz2sZFYeF7SqFrs/UVqTKGiSm8U17hCiVtGILnSlr5y h299sun@snorlax-login

ssh-ed25519 AAAAC3NzaC1lZDI1NTE5AAAAIGKeyN1RU8NHcD4Yilmt7QI/EeUVS+9AImk1hKb/+qzo jvhs0706@beluga.alliancecan.ca
ssh-ed25519 AAAAC3NzaC1lZDI1NTE5AAAAIPJS1bT+y9C+LVz1I94Hwpj6dIzbsDXx6GdGnFDJKJOm jvhs0706@cedar.alliancecan.ca
ssh-ed25519 AAAAC3NzaC1lZDI1NTE5AAAAIEZDD8IITiOM3mRm7wRQWv92lxZmAVFfMTS7LTrLsbwD jvhs0706@graham.alliancecan.ca
ssh-ed25519 AAAAC3NzaC1lZDI1NTE5AAAAICPto4sUBSpHFG5Wp+5Gb75et8+b8BeeaBLA9UT59pD8 jvhs0706@narval.alliancecan.ca
ssh-ed25519 AAAAC3NzaC1lZDI1NTE5AAAAIJtBpbgBpz1y7ZjhA48UdhF6VTVs2+pNK+wzF5j6Mvk6 jvhs0706@vlogin.vectorinstitute.ai

ssh-ed25519 AAAAC3NzaC1lZDI1NTE5AAAAIAxSyMLm7t+DzLaBcY9hKB7txxqRv1CLsMdAtiYWanjF h299sun@linux.cs.uwaterloo.ca
ssh-ed25519 AAAAC3NzaC1lZDI1NTE5AAAAIDkbPjzC47LumUS2WaO91M6+LKESsS8C3gm42HW9S4Nk h299sun@linux.student.cs.uwaterloo.ca

My SSH private keys can be accessed here.

Snorlax

The Snorlax cluster is configured as follows:

NODELIST             CPUS       MEMORY     AVAIL_FEATURES            GRES       
snorlax-[1-6]        128        946044     (null)                    gpu:2      

Requesting Resources Efficiently with Slurm

When initiating an interactive session, if you don’t require a GPU, it’s recommended to utilize a quarter to a half of the available resources excluding GPUs. For instance:

salloc --nodes=1 --ntasks-per-node=1 --cpus-per-task=32 --mem=236511M --time=2:59:59
salloc --nodes=1 --ntasks-per-node=1 --cpus-per-task=64 --mem=473022M --time=2:59:59

However, if your work involves GPUs, you have two options. You can either reserve some resources for those running CPU-only jobs:

salloc --nodes=1 --ntasks-per-node=1 --gpus-per-task=1 --cpus-per-task=32 --mem=236511M --time=2:59:59
salloc --nodes=1 --ntasks-per-node=2 --gpus-per-task=1 --cpus-per-task=32 --mem=473022M --time=2:59:59
salloc --nodes=1 --ntasks-per-node=1 --gpus-per-task=2 --cpus-per-task=64 --mem=473022M --time=2:59:59

Or, you can simply request half or all resources of an entire node:

salloc --nodes=1 --ntasks-per-node=1 --gpus-per-task=1 --cpus-per-task=64 --mem=473022M --time=2:59:59
salloc --nodes=1 --ntasks-per-node=2 --gpus-per-task=1 --cpus-per-task=64 --mem=0 --exclusive --time=2:59:59
salloc --nodes=1 --ntasks-per-node=1 --gpus-per-task=2 --cpus-per-task=128 --mem=0 --exclusive --time=2:59:59

This approach ensures that you’re making the most efficient use of the available resources based on your specific needs.

Working with conda

On Snorlax, the environments are managed using conda. It helps you organize the packages used for different projects, including, but not limited to, the Python packages. For example, let’s consider a project where you will need CUDA of a specific version, say 12.1.0:

conda create -n project-env python=3.11
conda activate project-env 
conda install cuda -c nvidia/label/cuda-12.1.0

Then install the Python packages (like torch) that you need:

pip install torch torchvision torchaudio transformers datasets

After that, start working on your project!

ComputeCanada

Due to unforeseeable circumstances, I’ve discontinued my use of ComputeCanada. I plan to elaborate further on its utilization when time permits. Meanwhile, please refer to its official documentation for more information.

Haochen Sun
Haochen Sun
Computer Science PhD Student

My research focuses on enhancing the security and privacy of machine learning and data management.