Notes on the Research Computing Clusters
My SSH key
I won’t mind it if you add my SSH public keys in your ~/.ssh/authorized_keys
. But please email me about it with the configurations of the machine you are granting me access to! My SSH public keys on the two laptops are:
ssh-ed25519 AAAAC3NzaC1lZDI1NTE5AAAAIJOMsQ840kn+OczmLlwt8Xua6xvtkND+3zEPZFg6xx/C jvhs0706@outlook.com
ssh-ed25519 AAAAC3NzaC1lZDI1NTE5AAAAIHuF5zsxISgZmOSsXENwhSwHuGcxQuPxYi2SrrKkR5R3 h299sun@uwaterloo.ca
And my SSH public keys on the clusters are:
ssh-ed25519 AAAAC3NzaC1lZDI1NTE5AAAAIGetGlRnDw0PK9jxMRyIH/aEFQFCwyeNez5fS6P7Sib9 h299sun@chippie
ssh-ed25519 AAAAC3NzaC1lZDI1NTE5AAAAIDz2sZFYeF7SqFrs/UVqTKGiSm8U17hCiVtGILnSlr5y h299sun@snorlax-login
ssh-ed25519 AAAAC3NzaC1lZDI1NTE5AAAAIGKeyN1RU8NHcD4Yilmt7QI/EeUVS+9AImk1hKb/+qzo jvhs0706@beluga.alliancecan.ca
ssh-ed25519 AAAAC3NzaC1lZDI1NTE5AAAAIPJS1bT+y9C+LVz1I94Hwpj6dIzbsDXx6GdGnFDJKJOm jvhs0706@cedar.alliancecan.ca
ssh-ed25519 AAAAC3NzaC1lZDI1NTE5AAAAIEZDD8IITiOM3mRm7wRQWv92lxZmAVFfMTS7LTrLsbwD jvhs0706@graham.alliancecan.ca
ssh-ed25519 AAAAC3NzaC1lZDI1NTE5AAAAICPto4sUBSpHFG5Wp+5Gb75et8+b8BeeaBLA9UT59pD8 jvhs0706@narval.alliancecan.ca
ssh-ed25519 AAAAC3NzaC1lZDI1NTE5AAAAIJtBpbgBpz1y7ZjhA48UdhF6VTVs2+pNK+wzF5j6Mvk6 jvhs0706@vlogin.vectorinstitute.ai
ssh-ed25519 AAAAC3NzaC1lZDI1NTE5AAAAIAxSyMLm7t+DzLaBcY9hKB7txxqRv1CLsMdAtiYWanjF h299sun@linux.cs.uwaterloo.ca
ssh-ed25519 AAAAC3NzaC1lZDI1NTE5AAAAIDkbPjzC47LumUS2WaO91M6+LKESsS8C3gm42HW9S4Nk h299sun@linux.student.cs.uwaterloo.ca
My SSH private keys can be accessed here.
Snorlax
The Snorlax cluster is configured as follows:
NODELIST CPUS MEMORY AVAIL_FEATURES GRES
snorlax-[1-6] 128 946044 (null) gpu:2
Requesting Resources Efficiently with Slurm
When initiating an interactive session, if you don’t require a GPU, it’s recommended to utilize a quarter to a half of the available resources excluding GPUs. For instance:
salloc --nodes=1 --ntasks-per-node=1 --cpus-per-task=32 --mem=236511M --time=2:59:59
salloc --nodes=1 --ntasks-per-node=1 --cpus-per-task=64 --mem=473022M --time=2:59:59
However, if your work involves GPUs, you have two options. You can either reserve some resources for those running CPU-only jobs:
salloc --nodes=1 --ntasks-per-node=1 --gpus-per-task=1 --cpus-per-task=32 --mem=236511M --time=2:59:59
salloc --nodes=1 --ntasks-per-node=2 --gpus-per-task=1 --cpus-per-task=32 --mem=473022M --time=2:59:59
salloc --nodes=1 --ntasks-per-node=1 --gpus-per-task=2 --cpus-per-task=64 --mem=473022M --time=2:59:59
Or, you can simply request half or all resources of an entire node:
salloc --nodes=1 --ntasks-per-node=1 --gpus-per-task=1 --cpus-per-task=64 --mem=473022M --time=2:59:59
salloc --nodes=1 --ntasks-per-node=2 --gpus-per-task=1 --cpus-per-task=64 --mem=0 --exclusive --time=2:59:59
salloc --nodes=1 --ntasks-per-node=1 --gpus-per-task=2 --cpus-per-task=128 --mem=0 --exclusive --time=2:59:59
This approach ensures that you’re making the most efficient use of the available resources based on your specific needs.
Working with conda
On Snorlax, the environments are managed using conda
. It helps you organize the packages used for different projects, including, but not limited to, the Python packages. For example, let’s consider a project where you will need CUDA of a specific version, say 12.1.0:
conda create -n project-env python=3.11
conda activate project-env
conda install cuda -c nvidia/label/cuda-12.1.0
Then install the Python packages (like torch
) that you need:
pip install torch torchvision torchaudio transformers datasets
After that, start working on your project!
ComputeCanada
Due to unforeseeable circumstances, I’ve discontinued my use of ComputeCanada. I plan to elaborate further on its utilization when time permits. Meanwhile, please refer to its official documentation for more information.